Re: BUG #17928: Standby fails to decode WAL on termination of primary - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date
Msg-id ZNVxSMWZdNOXN9sH@paquier.xyz
Whole thread Raw
In response to Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Michael Paquier <michael@paquier.xyz>)
List pgsql-bugs
On Thu, Aug 10, 2023 at 04:45:25PM +0900, Michael Paquier wrote:
> On Sun, Jul 16, 2023 at 05:49:05PM -0700, Noah Misch wrote:
>> - Use pg_logical_emit_message to fill a few segments with 0xFF.
>> - CHECKPOINT the primary, so the standby recycles segments.
>> - One more pg_logical_emit_message, computing the length from
>>   pg_current_wal_insert_lsn() such that new message crosses a segment boundary
>>   and ends 4 bytes before the end of a page.
>> - Stop the primary.
>> - If the bug is present, the standby will exit.
>
> Good idea to pollute the data with recycled segments.  Using a minimal
> WAL segment size whould help here as well in keeping a test cheap, and
> two segments should be enough.  The alignment calculations and the
> header size can be known, but the standby records are an issue for the
> predictability of the test when it comes to adjust the length of the
> logical message depending on the 8k WAL page, no?

Actually, for this one, I think that I have a simpler idea to make it
deterministic.  Once we have inserted a record at the page limit on
the primary, we can:
- Stop the standby
- Stop the primary
- Rewrite by ourselves a few bytes in the last segment on the standby
to emulate a recycled segment portion, based on the end LSN of the
logical message record, retrieved either with pg_walinspect or
pg_waldump.
- Start the standby, which would replay up to the previous record at
the page limit.
- The standby just be in a state where it waits for the missing
records from the primary and keeps looking at streaming, but it should
not fail startup.
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #18053: fastpath count per pid in pg_locks shows > 16 entries
Next
From: Bruce Momjian
Date:
Subject: Re: BUG #18040: PostgreSQL does not report its version correctly