Re: BUG #17928: Standby fails to decode WAL on termination of primary - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date
Msg-id CA+hUKGKcSyHRppTyGZy7q29E1JVtQKdKCNzY0QARG3gfqpLaXg@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Noah Misch <noah@leadboat.com>)
Responses Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Noah Misch <noah@leadboat.com>)
List pgsql-bugs
On Sun, Aug 13, 2023 at 9:13 AM Noah Misch <noah@leadboat.com> wrote:
> Any user could call pg_logical_emit_message() to silently terminate the WAL
> stream, which is far worse than the original bug.  So far, I'm seeing one way
> to clearly fix $SUBJECT without that harm.  When a record header spans a page
> boundary, read the next page to reassemble the header.  If
> !ValidXLogRecordHeader() (invalid xl_rmid or xl_prev), treat as end of WAL.
> Otherwise, read the whole record in chunks, calculating CRC.  If CRC is
> invalid, treat as end of WAL.  Otherwise, ereport(FATAL) for the oversized
> record.  A side benefit would be avoiding useless large allocations (1GB
> backend, 4GB frontend) as discussed upthread.  May as well do the xl_rmid and
> xl_prev checks in all branches, to avoid needless XLogRecordMaxSize-1
> allocations.  Thoughts?  For invalid-length records in v16+, since every such
> record is end-of-wal or corruption, those versions could skip the CRC.

That sounds quite strong.  But... why couldn't the existing
xlp_rem_len cross-check protect us from this failure mode?  If we
could defer the allocation until after that check (and the usual
ValidXLogRecordHeader() check), I think we'd know that we're really
looking at a size that was actually written in both pages (which must
also have survived xlp_pageaddr check), no?



pgsql-bugs by date:

Previous
From: Noah Misch
Date:
Subject: Re: BUG #17928: Standby fails to decode WAL on termination of primary
Next
From: Noah Misch
Date:
Subject: Re: BUG #17928: Standby fails to decode WAL on termination of primary