Re: pg_walinspect: ReadNextXLogRecord's first_record argument - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: pg_walinspect: ReadNextXLogRecord's first_record argument
Date
Msg-id CALj2ACUMBOM=f6PSH=J0kHsWE7psPKmRDEZ+ZfahsP=8-_bYeQ@mail.gmail.com
Whole thread Raw
In response to pg_walinspect: ReadNextXLogRecord's first_record argument  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: pg_walinspect: ReadNextXLogRecord's first_record argument
List pgsql-hackers
On Tue, Aug 16, 2022 at 10:04 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Hi,
>
> I was looking at the code for pg_walinspect today and I think I may
> have found a bug (or else I'm confused about how this all works, which
> is also possible). ReadNextXLogRecord() takes an argument first_record
> of type XLogRecPtr which is used only for error reporting purposes: if
> we fail to read the next record for a reason other than end-of-WAL, we
> complain that we couldn't read the WAL at the LSN specified by
> first_record.
>
> ReadNextXLogRecord() has three callers. In pg_get_wal_record_info(),
> we're just reading a single record, and the LSN passed as first_record
> is the LSN at which that record starts. Cool. But in the other two
> callers, GetWALRecordsInfo() and GetWalStats(), we're reading multiple
> records, and first_record is always passed as the LSN of the first
> record. That's logical enough given the name of the argument, but the
> effect of it seems to be that an error while reading any of the
> records will be reported using the LSN of the first record, which does
> not seem right.

Indeed. Thanks a lot for finding it.

> By contrast, pg_rewind's extractPageMap() reports the error using
> xlogreader->EndRecPtr. I think that's correct. The toplevel xlogreader
> function that we're calling here is XLogReadRecord(), which in turn
> calls XLogNextRecord(), which has this comment:
>
>         /*
>          * state->EndRecPtr is expected to have been set by the last call to
>          * XLogBeginRead() or XLogNextRecord(), and is the location of the
>          * error.
>          */
>
> Thoughts?

Agreed.

Here's a patch (for V15 as well) fixing this bug, please review.

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Attachment

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Next
From: Amit Kapila
Date:
Subject: Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock