Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)? - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)?
Date
Msg-id 47215279-228d-f30d-35d1-16af695e53f3@iki.fi
Whole thread Raw
In response to Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)?  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)?  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
On 04/05/18 10:05, Heikki Linnakangas wrote:
> On 24/04/18 13:57, Kyotaro HORIGUCHI wrote:
>> At Mon, 23 Apr 2018 03:41:47 -0400, Heikki Linnakangas <hlinnaka@iki.fi> wrote in
<89e33d4f-5c75-0738-3dcb-44c4df59e920@iki.fi>
>>> Looking at the patch linked above
>>> (https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp):
>>>
>>>> --- a/src/backend/access/transam/xlog.c
>>>> +++ b/src/backend/access/transam/xlog.c
>>>> @@ -11693,6 +11693,10 @@ retry:
>>>>        Assert(reqLen <= readLen);
>>>>         *readTLI = curFileTLI;
>>>> +
>>>> + if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
>>>> readBuf))
>>>> +        goto next_record_is_invalid;
>>>> +
>>>>        return readLen;
>>>>     next_record_is_invalid:
>>>
>>> What is the point of adding this XLogReaderValidatePageHeader() call?
>>> The caller of this callback function, ReadPageInternal(), checks the
>>> page header anyway. Earlier in this thread, you said:
>>
>> Without the lines, server actually fails to start replication.
>>
>> (I try to remember the details...)
>>
>> The difference is whether the function can cause retry for the
>> same portion of a set of continued records (without changing the
>> plugin API). XLogPageRead can do that. On the other hand all
>> ReadPageInternal can do is just letting the caller ReadRecords
>> retry from the beginning of the set of continued records since
>> the caller side handles only complete records.
>>
>> In the failure case, in XLogPageRead, when read() fails, it can
>> try the next source midst of a continued records. On the other
>> hand if the segment was read but it was recycled one, it passes
>> "success" to ReadPageInternal and leads to retring from the
>> beginning of the recrod. Infinite loop.
> 
> I see. You have the same problem if you have a WAL file that's corrupt
> in some other way, but one of the sources would have a correct copy.
> ValidXLogPageHeader() only checks the page header, after all. But unlike
> a recycled WAL segment, that's not supposed to happen as part of normal
> operation, so I guess we can live with that.

Pushed this now, after adding some comments. Thanks!

>> Calling XLogReaderValidatePageHeader in ReadPageInternal is
>> redundant, but removing it may be interface change of xlogreader
>> plugin and I am not sure that is allowed.
> 
> We should avoid doing that in back-branches, at least. But in 'master',
> I wouldn't mind redesigning the API. Dealing with all the retrying is
> pretty complicated as it is, if we can simplify that somehow, I think
> that'd be good.

I spent some time musing on what a better API might look like. We could 
remove the ReadPage callback, and instead have XLogReadRecord return a 
special return code to mean "give me more data". I'm thinking of 
something like:

/* return code of XLogReadRecord() */
typedef enum
{
     XLREAD_SUCCESS,
     XLREAD_INVALID_RECORD,  /* a record was read, but it was corrupt */
     XLREAD_INVALID_PAGE,    /* the page that was supplied looks invalid. */
     XLREAD_NEED_DATA,       /* caller should place more data in buffer, 
and retry */
} XLogReadRecord_Result;


And the calls to XLogReadRecord() would look something like this:

for(;;)
{
     rc = XLogReadRecord(reader, startptr, errormsg);

     if (rc == XLREAD_SUCCESS)
     {
         /* great, got record */
     }
     if (rc == XLREAD_INVALID_PAGE || XLREAD_INVALID_RECORD)
     {
         elog(ERROR, "invalid record");
     }
     if (rc == XLREAD_NEED_DATA)
     {
         /*
          * Read a page from disk, and place it into reader->readBuf
          */
         XLogPageRead(reader->readPagePtr, /* page to read */
                      reader->reqLen       /* # of bytes to read */ );
         /*
          * Now that we have read the data that XLogReadRecord()
          * requested, call it again.
          */
         continue;
     }
}

So instead of having a callback, XLogReadRecord() would return 
XLREAD_NEED_DATA. The caller would then need to place that data into the 
buffer, and call it again. If a record spans multiple pages, 
XLogReadRecord() would return with XLREAD_NEED_DATA multiple times, to 
read each page.

The important difference for the bug we're discussing on this thread is 
is that if you passed an invalid page to XLogReadRecord(), it would 
return with XLREAD_INVALID_PAGE. You could then try reading the same 
page from a different source, and call XLogReadRecord() again, and it 
could continue where it was left off, even if it was in the middle of a 
continuation record.

This is clearly not backpatchable, but maybe something to think about 
for v12.

- Heikki


pgsql-hackers by date:

Previous
From: Mike Blackwell
Date:
Subject: Re: perlcritic (was Re: pgsql: Fix precedence problem in new Perl code.)
Next
From: Tom Lane
Date:
Subject: Draft release notes are up