Re: BUG #17928: Standby fails to decode WAL on termination of primary - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date
Msg-id ZNSVlR/v8gTDIMsU@paquier.xyz
Whole thread Raw
In response to Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Noah Misch <noah@leadboat.com>)
Responses Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Michael Paquier <michael@paquier.xyz>)
Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Noah Misch <noah@leadboat.com>)
List pgsql-bugs
On Sun, Jul 16, 2023 at 05:49:05PM -0700, Noah Misch wrote:
> On my system, testing unpatched REL_15_2, the test reached its first failure
> at minute 53 and its second failure at minute 189.  Perhaps a faster,
> deterministic test would be feasible:

It took less than five minutes for me to reproduce the failure using
the test case sent by Alexander on HEAD:
2023-08-10 15:26:37.401 JST [11239] FATAL:  invalid memory alloc request size 2021163525
2023-08-10 15:26:37.405 JST [11235] LOG:  startup process (PID 11239) exited with exit code 1

I don't see it after two hours of tests with the patch.

> - Use pg_logical_emit_message to fill a few segments with 0xFF.
> - CHECKPOINT the primary, so the standby recycles segments.
> - One more pg_logical_emit_message, computing the length from
>   pg_current_wal_insert_lsn() such that new message crosses a segment boundary
>   and ends 4 bytes before the end of a page.
> - Stop the primary.
> - If the bug is present, the standby will exit.

Good idea to pollute the data with recycled segments.  Using a minimal
WAL segment size whould help here as well in keeping a test cheap, and
two segments should be enough.  The alignment calculations and the
header size can be known, but the standby records are an issue for the
predictability of the test when it comes to adjust the length of the
logical message depending on the 8k WAL page, no?

>> However, I'm unsure about merely adding a check for too-long
>> records, due to the potential for requesting an excessively large
>> amount of memory, even if it will be released shortly.
>>
>>>> [1] https://www.postgresql.org/message-id/20210505010835.umylslxgq4a6rbwg@alap3.anarazel.de
>>>
>>> Regarding [1], is it still worth zeroing recycled pages on standbys and/or
>>> reading the whole header before allocating xl_tot_len?  (Are there benefits
>>> other than avoiding a 1G backend allocation or 4G frontend allocation, or is
>>> that benefit worth the cycles?)
>>
>> I believe reading the whole header is the most sensible approach as it
>> can prevent unnecessary memory requests. Another potential solution
>> (or hack) for this specific case is to let XLogWalRcvFlush write a
>> finalizing ((uint32)0) when dying is true. This stabilizes the
>> behavior to "invalid record length.. got 0" when running the TAP test.

FWIW, I came back to this thread while tweaking the error reporting of
xlogreader.c for the sake of this thread and this proposal is an
improvement to be able to make a distinction between an OOM and an
incorrect record:
https://www.postgresql.org/message-id/ZMh/WV+CuknqePQQ@paquier.xyz

Anyway, agreed that it's an improvement to remove this check out of
allocate_recordbuf().  Noah, are you planning to work more on that?

>> --- a/src/backend/replication/walreceiver.c
>> +++ b/src/backend/replication/walreceiver.c
>> @@ -989,6 +989,15 @@ XLogWalRcvFlush(bool dying, TimeLineID tli)
>>  {
>>      Assert(tli != 0);
>>
>> +    if (dying)
>> +    {
>> +        int buflen = sizeof(uint32);
>> +        char buf[sizeof(uint32)];
>> +
>> +        memset(buf, 0, buflen);
>> +        XLogWalRcvWrite(buf, buflen, LogstreamResult.Write, tli);
>> +    }
>> +
>>      if (LogstreamResult.Flush < LogstreamResult.Write)
>>      {
>>          WalRcvData *walrcv = WalRcv;
>
> That's inexpensive in cycles, and it should make the failures in question
> quite rare.  If we do that instead of pre-zeroing, I think we'd still want to
> be ready for invalid lengths, for the case of unclean standby shutdown.

Tweaking the WAL receiver to do more zeroing has been discussed in the
past, but as far as I recall we were not completely sure whether it's
completely free, either.  It seems to me that this should have a
discussion on its own, in a new thread.  Not sure if we should worry
about archiving, actually, even if the segments should be padded with
zeros beforehand.

+   gotheader = targetRecOff <= XLOG_BLCKSZ - SizeOfXLogRecord ? true : false;
Perhaps this could use more parenthesis around the expression checked,
for readability.
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #18034: Accept the spelling "+infinity" in datetime input is not accurate
Next
From: Bruce Momjian
Date:
Subject: Re: BUG #18034: Accept the spelling "+infinity" in datetime input is not accurate