Re: Error with index on unlogged table - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Error with index on unlogged table |
Date | |
Msg-id | 20151204081019.GC28762@awork2.anarazel.de Whole thread Raw |
In response to | Re: Error with index on unlogged table (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: Error with index on unlogged table
|
List | pgsql-hackers |
On 2015-12-04 17:00:13 +0900, Michael Paquier wrote: > Andres Freud wrote: > >> @@ -450,14 +450,21 @@ ginbuildempty(PG_FUNCTION_ARGS) > >> START_CRIT_SECTION(); > >> GinInitMetabuffer(MetaBuffer); > >> MarkBufferDirty(MetaBuffer); > >> - log_newpage_buffer(MetaBuffer, false); > >> + log_newpage_buffer(MetaBuffer, false, false); > >> GinInitBuffer(RootBuffer, GIN_LEAF); > >> MarkBufferDirty(RootBuffer); > >> - log_newpage_buffer(RootBuffer, false); > >> + log_newpage_buffer(RootBuffer, false, true); > >> END_CRIT_SECTION(); > >> > > Why isn't the metapage flushed to disk? > > I was not sure if we should only flush only at the last page, as pages > just before would be already replayed. Switched. Uh, that's not how it works! The earlier pages would just be in shared buffers. Neither smgrwrite, nor smgrimmedsync does anything about that! > >> extern void InitXLogInsert(void); > >> diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h > >> index ad1eb4b..91445f1 100644 > >> --- a/src/include/catalog/pg_control.h > >> +++ b/src/include/catalog/pg_control.h > >> @@ -73,6 +73,7 @@ typedef struct CheckPoint > >> #define XLOG_END_OF_RECOVERY 0x90 > >> #define XLOG_FPI_FOR_HINT 0xA0 > >> #define XLOG_FPI 0xB0 > >> +#define XLOG_FPI_FOR_SYNC 0xC0 > > > > > > I'm not a big fan of the XLOG_FPI_FOR_SYNC name. Syncing is a bit too > > ambigous for my taste. How about either naming it XLOG_FPI_FLUSH or > > instead adding actual record data and a 'flags' field in there? I > > slightly prefer the latter - XLOG_FPI and XLOG_FPI_FOR_HINT really are > > different, XLOG_FPI_FOR_SYNC not so much. > > Let's go for XLOG_FPI_FLUSH. I think the other way is a bit better, because we can add new flags without changing the WAL format. > diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c > index 99337b0..b646101 100644 > --- a/src/backend/access/brin/brin.c > +++ b/src/backend/access/brin/brin.c > @@ -682,7 +682,15 @@ brinbuildempty(PG_FUNCTION_ARGS) > brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index), > BRIN_CURRENT_VERSION); > MarkBufferDirty(metabuf); > - log_newpage_buffer(metabuf, false); > + > + /* > + * For unlogged relations, this page should be immediately flushed > + * to disk after being replayed. This is necessary to ensure that the > + * initial on-disk state of unlogged relations is preserved when > + * they get reset at the end of recovery. > + */ > + log_newpage_buffer(metabuf, false, > + index->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED); > END_CRIT_SECTION(); Maybe write the last sentence as '... as the on disk files are copied directly at the end of recovery.'? > @@ -336,7 +336,8 @@ end_heap_rewrite(RewriteState state) > MAIN_FORKNUM, > state->rs_blockno, > state->rs_buffer, > - true); > + true, > + false); > RelationOpenSmgr(state->rs_new_rel); > > PageSetChecksumInplace(state->rs_buffer, state->rs_blockno); > @@ -685,7 +686,8 @@ raw_heap_insert(RewriteState state, HeapTuple tup) > MAIN_FORKNUM, > state->rs_blockno, > page, > - true); > + true, > + false); Did you verify that that's ok when a unlogged table is clustered/vacuum full'ed? > @@ -181,6 +183,9 @@ xlog_identify(uint8 info) > case XLOG_FPI_FOR_HINT: > id = "FPI_FOR_HINT"; > break; > + case XLOG_FPI_FLUSH: > + id = "FPI_FOR_SYNC"; > + break; > } Old string. > --- a/src/backend/access/transam/xloginsert.c > +++ b/src/backend/access/transam/xloginsert.c > @@ -932,10 +932,13 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std) > * If the page follows the standard page layout, with a PageHeader and unused > * space between pd_lower and pd_upper, set 'page_std' to TRUE. That allows > * the unused space to be left out from the WAL record, making it smaller. > + * > + * If 'is_flush' is set to TRUE, relation will be requested to flush > + * immediately its data at replay after replaying this full page image. > */ s/is_flush/flush_immed/? And maybe say that it 'will be flushed to the OS immediately after replaying the record'? Greetings, Andres Freund
pgsql-hackers by date: