On 24/04/18 13:57, Kyotaro HORIGUCHI wrote:
> At Mon, 23 Apr 2018 03:41:47 -0400, Heikki Linnakangas <hlinnaka@iki.fi> wrote in
<89e33d4f-5c75-0738-3dcb-44c4df59e920@iki.fi>
>> Looking at the patch linked above
>> (https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp):
>>
>>> --- a/src/backend/access/transam/xlog.c
>>> +++ b/src/backend/access/transam/xlog.c
>>> @@ -11693,6 +11693,10 @@ retry:
>>> Assert(reqLen <= readLen);
>>> *readTLI = curFileTLI;
>>> +
>>> + if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
>>> readBuf))
>>> + goto next_record_is_invalid;
>>> +
>>> return readLen;
>>> next_record_is_invalid:
>>
>> What is the point of adding this XLogReaderValidatePageHeader() call?
>> The caller of this callback function, ReadPageInternal(), checks the
>> page header anyway. Earlier in this thread, you said:
>
> Without the lines, server actually fails to start replication.
>
> (I try to remember the details...)
>
> The difference is whether the function can cause retry for the
> same portion of a set of continued records (without changing the
> plugin API). XLogPageRead can do that. On the other hand all
> ReadPageInternal can do is just letting the caller ReadRecords
> retry from the beginning of the set of continued records since
> the caller side handles only complete records.
>
> In the failure case, in XLogPageRead, when read() fails, it can
> try the next source midst of a continued records. On the other
> hand if the segment was read but it was recycled one, it passes
> "success" to ReadPageInternal and leads to retring from the
> beginning of the recrod. Infinite loop.
I see. You have the same problem if you have a WAL file that's corrupt
in some other way, but one of the sources would have a correct copy.
ValidXLogPageHeader() only checks the page header, after all. But unlike
a recycled WAL segment, that's not supposed to happen as part of normal
operation, so I guess we can live with that.
> Calling XLogReaderValidatePageHeader in ReadPageInternal is
> redundant, but removing it may be interface change of xlogreader
> plugin and I am not sure that is allowed.
We should avoid doing that in back-branches, at least. But in 'master',
I wouldn't mind redesigning the API. Dealing with all the retrying is
pretty complicated as it is, if we can simplify that somehow, I think
that'd be good.
- Heikki