Re: BUG #17928: Standby fails to decode WAL on termination of primary - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date
Msg-id ZQ9zf1QO8CP4TZRO@paquier.xyz
Whole thread Raw
In response to Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: BUG #17928: Standby fails to decode WAL on termination of primary
List pgsql-bugs
On Sun, Sep 24, 2023 at 09:48:42AM +1300, Thomas Munro wrote:
> "grison" has a little more detail --  we see
> pg_comp_crc32c_sb8(len=4294636456).  I'm wondering how to reproduce
> this, but among the questions that jump out I have: why was it ever OK
> that we load record->xl_tot_len into total_len, perform header
> validation, determine that total_len < len (= this record is all on
> one page, no reassembly loop needed, so now we're in the single-page
> branch), then call ReadPageInternal() again, then call
> ValidXLogRecord() which internally loads record->xl_tot_len *again*?
> ReadPageInternal() might have changed xl_tot_len, no?  That seems to
> be a possible pathway to reading past the end of the buffer in the CRC
> check, no?
>
> If that value didn't change underneath us, I think we'd need an
> explanation for how we finished up in the single-page branch at
> xlogreader.c:842 with a large xl_tot_len, which I'm not seeing yet,
> though it might take more coffee.  (Possibly supporting the re-read
> theory is the fact that it's only happening on a few very slow
> computers, though I have no idea why it would only happen on master
> [so far at least].)

Hmm, it looks pretty clear that this is a HEAD-only thing as the
buildfarm shows and as you say, and my primary suspect here would be
71e4cc6b8ec6, I think.  Any race condition underneath it would be
easier to see on slower machines.  So it's likely possible that this
has messed up the page insertion logic.
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: BUG #18129: GiST index produces incorrect query results
Next
From: vignesh C
Date:
Subject: Re: [16+] subscription can end up in inconsistent state