Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)? - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)?
Date
Msg-id 20180514.155913.76739505.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)?  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
Thank you for adding the detailed comment and commiting.

At Sat, 5 May 2018 01:49:31 +0300, Heikki Linnakangas <hlinnaka@iki.fi> wrote in
<47215279-228d-f30d-35d1-16af695e53f3@iki.fi>
> On 04/05/18 10:05, Heikki Linnakangas wrote:
> > On 24/04/18 13:57, Kyotaro HORIGUCHI wrote:
> >> At Mon, 23 Apr 2018 03:41:47 -0400, Heikki Linnakangas
> >> <hlinnaka@iki.fi> wrote in
> >> <89e33d4f-5c75-0738-3dcb-44c4df59e920@iki.fi>
> >>> Looking at the patch linked above
> >>> (https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp):
> >>>
> >>>> --- a/src/backend/access/transam/xlog.c
> >>>> +++ b/src/backend/access/transam/xlog.c
> >>>> @@ -11693,6 +11693,10 @@ retry:
> >>>>        Assert(reqLen <= readLen);
> >>>>         *readTLI = curFileTLI;
> >>>> +
> >>>> + if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
> >>>> readBuf))
> >>>> +        goto next_record_is_invalid;
> >>>> +
> >>>>        return readLen;
> >>>>     next_record_is_invalid:
> >>>
> >>> What is the point of adding this XLogReaderValidatePageHeader() call?
> >>> The caller of this callback function, ReadPageInternal(), checks the
> >>> page header anyway. Earlier in this thread, you said:
> >>
> >> Without the lines, server actually fails to start replication.
> >>
> >> (I try to remember the details...)
> >>
> >> The difference is whether the function can cause retry for the
> >> same portion of a set of continued records (without changing the
> >> plugin API). XLogPageRead can do that. On the other hand all
> >> ReadPageInternal can do is just letting the caller ReadRecords
> >> retry from the beginning of the set of continued records since
> >> the caller side handles only complete records.
> >>
> >> In the failure case, in XLogPageRead, when read() fails, it can
> >> try the next source midst of a continued records. On the other
> >> hand if the segment was read but it was recycled one, it passes
> >> "success" to ReadPageInternal and leads to retring from the
> >> beginning of the recrod. Infinite loop.
> > I see. You have the same problem if you have a WAL file that's corrupt
> > in some other way, but one of the sources would have a correct copy.
> > ValidXLogPageHeader() only checks the page header, after all. But
> > unlike
> > a recycled WAL segment, that's not supposed to happen as part of
> > normal
> > operation, so I guess we can live with that.

Anyway we read successive complete records from different sources
so I think if such curruption causes a problem we would have
faced the same problem even without this.

> Pushed this now, after adding some comments. Thanks!

Thanks!

> >> Calling XLogReaderValidatePageHeader in ReadPageInternal is
> >> redundant, but removing it may be interface change of xlogreader
> >> plugin and I am not sure that is allowed.
> > We should avoid doing that in back-branches, at least. But in
> > 'master',
> > I wouldn't mind redesigning the API. Dealing with all the retrying is
> > pretty complicated as it is, if we can simplify that somehow, I think
> > that'd be good.

Agreed.

> I spent some time musing on what a better API might look like. We
> could remove the ReadPage callback, and instead have XLogReadRecord
> return a special return code to mean "give me more data". I'm thinking
> of something like:

Sounds reasonable. That makes an additional return/call iteration
to XLogReadRecord but it would be ignorable comparing to the cost
of XLogPageRead.

> /* return code of XLogReadRecord() */
> typedef enum
> {
>     XLREAD_SUCCESS,
>     XLREAD_INVALID_RECORD,  /* a record was read, but it was corrupt */
>     XLREAD_INVALID_PAGE, /* the page that was supplied looks invalid. */
>     XLREAD_NEED_DATA, /* caller should place more data in buffer, and
>     retry */
> } XLogReadRecord_Result;
> 
> 
> And the calls to XLogReadRecord() would look something like this:
> 
> for(;;)
> {
>     rc = XLogReadRecord(reader, startptr, errormsg);
> 
>     if (rc == XLREAD_SUCCESS)
>     {
>         /* great, got record */
>     }
>     if (rc == XLREAD_INVALID_PAGE || XLREAD_INVALID_RECORD)
>     {
>         elog(ERROR, "invalid record");
>     }
>     if (rc == XLREAD_NEED_DATA)
>     {
>         /*
>          * Read a page from disk, and place it into reader->readBuf
>          */
>         XLogPageRead(reader->readPagePtr, /* page to read */
>                      reader->reqLen       /* # of bytes to read */ );
>         /*
>          * Now that we have read the data that XLogReadRecord()
>          * requested, call it again.
>          */
>         continue;
>     }
> }
> 
> So instead of having a callback, XLogReadRecord() would return
> XLREAD_NEED_DATA. The caller would then need to place that data into
> the buffer, and call it again. If a record spans multiple pages,
> XLogReadRecord() would return with XLREAD_NEED_DATA multiple times, to
> read each page.

It seems easier to understand at a glance. In short, I like it.

> The important difference for the bug we're discussing on this thread
> is is that if you passed an invalid page to XLogReadRecord(), it would
> return with XLREAD_INVALID_PAGE. You could then try reading the same
> page from a different source, and call XLogReadRecord() again, and it
> could continue where it was left off, even if it was in the middle of
> a continuation record.

We will have more control on how to continue reading WAL with
this than the current code and it seems preferable as the whole.

> This is clearly not backpatchable, but maybe something to think about
> for v12.

Are you planning to working on this shortly?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Simon Muller
Date:
Subject: Re: Allow COPY's 'text' format to output a header
Next
From: Pavan Deolasee
Date:
Subject: Re: PANIC during crash recovery of a recently promoted standby