Re: BUG #17928: Standby fails to decode WAL on termination of primary - Mailing list pgsql-bugs

From Noah Misch
Subject Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date
Msg-id 20230811025808.GA2261232@rfd.leadboat.com
Whole thread Raw
In response to Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Michael Paquier <michael@paquier.xyz>)
Responses Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Michael Paquier <michael@paquier.xyz>)
List pgsql-bugs
On Thu, Aug 10, 2023 at 04:45:25PM +0900, Michael Paquier wrote:
> On Sun, Jul 16, 2023 at 05:49:05PM -0700, Noah Misch wrote:
> > On my system, testing unpatched REL_15_2, the test reached its first failure
> > at minute 53 and its second failure at minute 189.  Perhaps a faster,
> > deterministic test would be feasible:
> 
> It took less than five minutes for me to reproduce the failure using
> the test case sent by Alexander on HEAD:
> 2023-08-10 15:26:37.401 JST [11239] FATAL:  invalid memory alloc request size 2021163525
> 2023-08-10 15:26:37.405 JST [11235] LOG:  startup process (PID 11239) exited with exit code 1
> 
> I don't see it after two hours of tests with the patch.

That is good.

> > - Use pg_logical_emit_message to fill a few segments with 0xFF.
> > - CHECKPOINT the primary, so the standby recycles segments.
> > - One more pg_logical_emit_message, computing the length from
> >   pg_current_wal_insert_lsn() such that new message crosses a segment boundary
> >   and ends 4 bytes before the end of a page.
> > - Stop the primary.
> > - If the bug is present, the standby will exit.
> 
> Good idea to pollute the data with recycled segments.  Using a minimal
> WAL segment size whould help here as well in keeping a test cheap, and
> two segments should be enough.  The alignment calculations and the
> header size can be known, but the standby records are an issue for the
> predictability of the test when it comes to adjust the length of the
> logical message depending on the 8k WAL page, no?

Could be.  I expect there would be challenges translating that outline into a
real test, but I don't know if that will be a major one.  The test doesn't
need to be 100% deterministic.  If it fails 25% of the time and is not the
slowest test in the recovery suite, I'd find that good enough.

> >>>> [1] https://www.postgresql.org/message-id/20210505010835.umylslxgq4a6rbwg@alap3.anarazel.de
> >>> 
> >>> Regarding [1], is it still worth zeroing recycled pages on standbys and/or
> >>> reading the whole header before allocating xl_tot_len?  (Are there benefits
> >>> other than avoiding a 1G backend allocation or 4G frontend allocation, or is
> >>> that benefit worth the cycles?)
> >> 
> >> I believe reading the whole header is the most sensible approach as it
> >> can prevent unnecessary memory requests. Another potential solution
> >> (or hack) for this specific case is to let XLogWalRcvFlush write a
> >> finalizing ((uint32)0) when dying is true. This stabilizes the
> >> behavior to "invalid record length.. got 0" when running the TAP test.
> 
> FWIW, I came back to this thread while tweaking the error reporting of
> xlogreader.c for the sake of this thread and this proposal is an
> improvement to be able to make a distinction between an OOM and an
> incorrect record:
> https://www.postgresql.org/message-id/ZMh/WV+CuknqePQQ@paquier.xyz
> 
> Anyway, agreed that it's an improvement to remove this check out of
> allocate_recordbuf().  Noah, are you planning to work more on that?

I can push xl_tot_len-validate-v1.patch, particularly given the testing result
you reported today.  I'm content for my part to stop there.

> >> --- a/src/backend/replication/walreceiver.c
> >> +++ b/src/backend/replication/walreceiver.c
> >> @@ -989,6 +989,15 @@ XLogWalRcvFlush(bool dying, TimeLineID tli)
> >>  {
> >>      Assert(tli != 0);
> >>  
> >> +    if (dying)
> >> +    {
> >> +        int buflen = sizeof(uint32);
> >> +        char buf[sizeof(uint32)];
> >> +        
> >> +        memset(buf, 0, buflen);
> >> +        XLogWalRcvWrite(buf, buflen, LogstreamResult.Write, tli);
> >> +    }
> >> +    
> >>      if (LogstreamResult.Flush < LogstreamResult.Write)
> >>      {
> >>          WalRcvData *walrcv = WalRcv;
> > 
> > That's inexpensive in cycles, and it should make the failures in question
> > quite rare.  If we do that instead of pre-zeroing, I think we'd still want to
> > be ready for invalid lengths, for the case of unclean standby shutdown.
> 
> Tweaking the WAL receiver to do more zeroing has been discussed in the
> past, but as far as I recall we were not completely sure whether it's
> completely free, either.  It seems to me that this should have a
> discussion on its own, in a new thread.  Not sure if we should worry
> about archiving, actually, even if the segments should be padded with
> zeros beforehand.

Okay.

> +   gotheader = targetRecOff <= XLOG_BLCKSZ - SizeOfXLogRecord ? true : false;

Okay.



pgsql-bugs by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: BUG #18040: PostgreSQL does not report its version correctly
Next
From: Michael Paquier
Date:
Subject: Re: BUG #17928: Standby fails to decode WAL on termination of primary