Re: Error with index on unlogged table - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Error with index on unlogged table
Date
Msg-id 20151204081019.GC28762@awork2.anarazel.de
Whole thread Raw
In response to Re: Error with index on unlogged table  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: Error with index on unlogged table
List pgsql-hackers
On 2015-12-04 17:00:13 +0900, Michael Paquier wrote:
> Andres Freud wrote:
> >> @@ -450,14 +450,21 @@ ginbuildempty(PG_FUNCTION_ARGS)
> >>       START_CRIT_SECTION();
> >>       GinInitMetabuffer(MetaBuffer);
> >>       MarkBufferDirty(MetaBuffer);
> >> -     log_newpage_buffer(MetaBuffer, false);
> >> +     log_newpage_buffer(MetaBuffer, false, false);
> >>       GinInitBuffer(RootBuffer, GIN_LEAF);
> >>       MarkBufferDirty(RootBuffer);
> >> -     log_newpage_buffer(RootBuffer, false);
> >> +     log_newpage_buffer(RootBuffer, false, true);
> >>       END_CRIT_SECTION();
> >>
> > Why isn't the metapage flushed to disk?
> 
> I was not sure if we should only flush only at the last page, as pages
> just before would be already replayed. Switched.

Uh, that's not how it works! The earlier pages would just be in shared
buffers. Neither smgrwrite, nor smgrimmedsync does anything about that!

> >>  extern void InitXLogInsert(void);
> >> diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
> >> index ad1eb4b..91445f1 100644
> >> --- a/src/include/catalog/pg_control.h
> >> +++ b/src/include/catalog/pg_control.h
> >> @@ -73,6 +73,7 @@ typedef struct CheckPoint
> >>  #define XLOG_END_OF_RECOVERY                 0x90
> >>  #define XLOG_FPI_FOR_HINT                            0xA0
> >>  #define XLOG_FPI                                             0xB0
> >> +#define XLOG_FPI_FOR_SYNC                            0xC0
> >
> >
> > I'm not a big fan of the XLOG_FPI_FOR_SYNC name. Syncing is a bit too
> > ambigous for my taste. How about either naming it XLOG_FPI_FLUSH or
> > instead adding actual record data and a 'flags' field in there? I
> > slightly prefer the latter - XLOG_FPI and XLOG_FPI_FOR_HINT really are
> > different, XLOG_FPI_FOR_SYNC not so much.
> 
> Let's go for XLOG_FPI_FLUSH.

I think the other way is a bit better, because we can add new flags
without changing the WAL format.

> diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
> index 99337b0..b646101 100644
> --- a/src/backend/access/brin/brin.c
> +++ b/src/backend/access/brin/brin.c
> @@ -682,7 +682,15 @@ brinbuildempty(PG_FUNCTION_ARGS)
>      brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index),
>                         BRIN_CURRENT_VERSION);
>      MarkBufferDirty(metabuf);
> -    log_newpage_buffer(metabuf, false);
> +
> +    /*
> +     * For unlogged relations, this page should be immediately flushed
> +     * to disk after being replayed. This is necessary to ensure that the
> +     * initial on-disk state of unlogged relations is preserved when
> +     * they get reset at the end of recovery.
> +     */
> +    log_newpage_buffer(metabuf, false,
> +        index->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED);
>      END_CRIT_SECTION();

Maybe write the last sentence as '... as the on disk files are copied
directly at the end of recovery.'?

> @@ -336,7 +336,8 @@ end_heap_rewrite(RewriteState state)
>                          MAIN_FORKNUM,
>                          state->rs_blockno,
>                          state->rs_buffer,
> -                        true);
> +                        true,
> +                        false);
>          RelationOpenSmgr(state->rs_new_rel);
>  
>          PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
> @@ -685,7 +686,8 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
>                              MAIN_FORKNUM,
>                              state->rs_blockno,
>                              page,
> -                            true);
> +                            true,
> +                            false);

Did you verify that that's ok when a unlogged table is clustered/vacuum
full'ed?

> @@ -181,6 +183,9 @@ xlog_identify(uint8 info)
>          case XLOG_FPI_FOR_HINT:
>              id = "FPI_FOR_HINT";
>              break;
> +        case XLOG_FPI_FLUSH:
> +            id = "FPI_FOR_SYNC";
> +            break;
>      }

Old string.


> --- a/src/backend/access/transam/xloginsert.c
> +++ b/src/backend/access/transam/xloginsert.c
> @@ -932,10 +932,13 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
>   * If the page follows the standard page layout, with a PageHeader and unused
>   * space between pd_lower and pd_upper, set 'page_std' to TRUE. That allows
>   * the unused space to be left out from the WAL record, making it smaller.
> + *
> + * If 'is_flush' is set to TRUE, relation will be requested to flush
> + * immediately its data at replay after replaying this full page image.
>   */

s/is_flush/flush_immed/? And maybe say that it 'will be flushed to the
OS immediately after replaying the record'?


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: parallel joins, and better parallel explain
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: psql: add \pset true/false