Re: Fix pg_waldump to exit cleanly at end of WAL - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Fix pg_waldump to exit cleanly at end of WAL
Date
Msg-id aLesKvM9QpvCVJd2@paquier.xyz
Whole thread Raw
In response to Re: Fix pg_waldump to exit cleanly at end of WAL  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Fix pg_waldump to exit cleanly at end of WAL
List pgsql-hackers
On Wed, Sep 03, 2025 at 09:11:15AM +0900, Fujii Masao wrote:
> Can pg_waldump really distinguish between the end of WAL and corruption?

I don't think you can really do that reliably, as some of the messages
marking the end of WAL could also be bumped into upon a corruption, as
far as I recall.  We need the CRC record check to make the
distinction, which we cannot do at this stage because we don't have
the full record yet for the check.

Perhaps what's been posted on your thread [1] could be revisited for
the xlogreader because we are able to read the record headers more
reliably thanks to Thomas' work around bae868caf222, backtracking on
my previous take posted here, posted prior to this commit:
https://www.postgresql.org/message-id/ZadmUE-edk2Z4CQU@paquier.xyz

Discarding the error message when we read what we think is an
incorrect value for the first field in the record header (total record
length) means that we may lose some information that's actually legit
to know about, so the proposed patch is wrong.  Tweaking xlogreader.c
to let its callers take the decision would be better, even if it puts
the cost of the decision to all the tools.  One problem is that this
brings some complexity in xlogreader.c itself, which may not justify
bothering about all that.

(Note: that's likely a biased opinion as I am used to live with these
messages when running WAL record parsers, but I understand that for
newcomers these are confusing to read the first time as they can be
read as "my cluster is deeply broken and my WAL is corrupted".)
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Update outdated references to SLRU ControlLock
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: POC: enable logical decoding when wal_level = 'replica' without a server restart