Re: BUG #17928: Standby fails to decode WAL on termination of primary - Mailing list pgsql-bugs

From Noah Misch
Subject Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date
Msg-id 20230717004905.GE3826297@rfd.leadboat.com
Whole thread Raw
In response to Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: BUG #17928: Standby fails to decode WAL on termination of primary
Re: BUG #17928: Standby fails to decode WAL on termination of primary
List pgsql-bugs
On Tue, Jul 11, 2023 at 03:54:14PM +0900, Kyotaro Horiguchi wrote:
> At Mon, 10 Jul 2023 13:00:12 -0700, Noah Misch <noah@leadboat.com> wrote in 
> > On Mon, May 15, 2023 at 03:38:17PM +1200, Thomas Munro wrote:
> > > On Fri, May 12, 2023 at 6:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
> > > > 2023-05-11 20:19:22.248 MSK [2037134] FATAL:  invalid memory alloc request size 2021163525
> > > > 2023-05-11 20:19:22.248 MSK [2037114] LOG:  startup process (PID 2037134) exited with exit code 1

On my system, testing unpatched REL_15_2, the test reached its first failure
at minute 53 and its second failure at minute 189.  Perhaps a faster,
deterministic test would be feasible:

- Use pg_logical_emit_message to fill a few segments with 0xFF.
- CHECKPOINT the primary, so the standby recycles segments.
- One more pg_logical_emit_message, computing the length from
  pg_current_wal_insert_lsn() such that new message crosses a segment boundary
  and ends 4 bytes before the end of a page.
- Stop the primary.
- If the bug is present, the standby will exit.

> > > Thanks Alexander.  Looking into this.  I think it is probably
> > > something like: recycled standby pages are not zeroed (something we
> > > already needed to do something about[1]), and when we read a recycled
> > > garbage size (like your "xxxx") at the end of a page at an offset
> > > where we don't have a full record header on one page, we skip the
> > > ValidXLogRecordHeader() call (and always did), but the check in
> > > allocate_recordbuf() which previously handled that "gracefully" (well,
> > > it would try to allocate up to 1GB bogusly, but it wouldn't try to
> > > allocate more than that and ereport) is a bit too late.  I probably
> > > need to add an earlier not-too-big validation.  Thinking.
> > 
> > I agree about an earlier not-too-big validation.  Like the attached?  I
> > haven't tested it with Alexander's recipe or pondered it thoroughly.
> 
> I like the patch for its code clean up, but I'm afraid that it removes
> the existing record length check when reading continued
> pages.

The removed check would be repetitive in all cases known to me, but I wouldn't
mind keeping the check instead.

> However, I'm unsure about merely adding a check for too-long
> records, due to the potential for requesting an excessively large
> amount of memory, even if it will be released shortly.
> 
> > > [1] https://www.postgresql.org/message-id/20210505010835.umylslxgq4a6rbwg@alap3.anarazel.de
> > 
> > Regarding [1], is it still worth zeroing recycled pages on standbys and/or
> > reading the whole header before allocating xl_tot_len?  (Are there benefits
> > other than avoiding a 1G backend allocation or 4G frontend allocation, or is
> > that benefit worth the cycles?)
> 
> I believe reading the whole header is the most sensible approach as it
> can prevent unnecessary memory requests. Another potential solution
> (or hack) for this specific case is to let XLogWalRcvFlush write a
> finalizing ((uint32)0) when dying is true. This stabilizes the
> behavior to "invalid record length.. got 0" when running the TAP test.

> --- a/src/backend/replication/walreceiver.c
> +++ b/src/backend/replication/walreceiver.c
> @@ -989,6 +989,15 @@ XLogWalRcvFlush(bool dying, TimeLineID tli)
>  {
>      Assert(tli != 0);
>  
> +    if (dying)
> +    {
> +        int buflen = sizeof(uint32);
> +        char buf[sizeof(uint32)];
> +        
> +        memset(buf, 0, buflen);
> +        XLogWalRcvWrite(buf, buflen, LogstreamResult.Write, tli);
> +    }
> +    
>      if (LogstreamResult.Flush < LogstreamResult.Write)
>      {
>          WalRcvData *walrcv = WalRcv;

That's inexpensive in cycles, and it should make the failures in question
quite rare.  If we do that instead of pre-zeroing, I think we'd still want to
be ready for invalid lengths, for the case of unclean standby shutdown.



pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: BUG #17949: Adding an index introduces serialisation anomalies.
Next
From: Paul De Audney
Date:
Subject: Query returns error "there is no parameter $1" but server logs that there are two parameters supplied