Re: pg_walfile_name_offset can return inconsistent values - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: pg_walfile_name_offset can return inconsistent values
Date
Msg-id CAEze2Wh7qWW6jbULq6mZ14fp_hNh9gM53aq4cG0b1m3DkfLU3w@mail.gmail.com
Whole thread Raw
In response to Re: pg_walfile_name_offset can return inconsistent values  (Bruce Momjian <bruce@momjian.us>)
Responses Re: pg_walfile_name_offset can return inconsistent values
List pgsql-hackers
On Thu, 9 Nov 2023 at 20:22, Bruce Momjian <bruce@momjian.us> wrote:
> I know this bug report is four years old, but it is still a
> pg_walfile_name_offset() bug.  Here is the bug:
>
>         SELECT *
>         FROM (VALUES ('0/16ffffff'), ('0/17000000'), ('0/17000001')) AS t(lsn),
>              LATERAL pg_walfile_name_offset(lsn::pg_lsn);
>
>             lsn     |        file_name         | file_offset
>         ------------+--------------------------+-------------
>          0/16ffffff | 000000010000000000000016 |    16777215
> -->      0/17000000 | 000000010000000000000016 |           0
>          0/17000001 | 000000010000000000000017 |           1
>
> The bug is in the indicated line --- it shows the filename as 00016 but
> offset as zero, when clearly the LSN is pointing to 17/0.  The bug is
> essentially that the code for pg_walfile_name_offset() uses the exact
> offset from the LSN, but uses the file name from the previous byte of
> the LSN.

Yes, that's definitely not a correct result.

> The fix involves deciding what the description or purpose of
> pg_walfile_name_offset() means, and adjusting it to be clearer.  The
> current documentation says:
>
>         Converts a write-ahead log location to a WAL file name and byte
>         offset within that file.
>
> Fix #1:  If we assume write-ahead log location means LSN, it is saying
> show the file/offset of the LSN, and that is most clearly:
>
>             lsn     |        file_name         | file_offset
>         ------------+--------------------------+-------------
>          0/16ffffff | 000000010000000000000016 |    16777215
>          0/17000000 | 000000010000000000000017 |           0
>          0/17000001 | 000000010000000000000017 |           1
>
> Fix #2:  Now, there are some who have said they want the output to be
> the last written WAL byte (the byte before the LSN), not the current
> LSN, for archiving purposes.  However, if we do that, we have to update
> the docs to clarify it.  Its output would be:
>
>             lsn     |        file_name         | file_offset
>         ------------+--------------------------+-------------
>          0/16ffffff | 000000010000000000000016 |    16777214
>          0/17000000 | 000000010000000000000016 |    16777215
>          0/17000001 | 000000010000000000000017 |           0
>
> I have attached fix #1 as offset1.diff and fix #2 as offset2.diff.

I believe you got the references wrong; fix #1 looks like the output
of offset2's changes, and fix #2 looks like the result of offset1's
changes.

Either way, I think fix #1 is most correct (as was attached in
offset2.diff, and quoted verbatim here), because that has no chance of
having surprising underflowing behaviour when you use '0/0'::lsn as
input.

> diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
> index 45a70668b1..e65502d51e 100644
> --- a/src/backend/access/transam/xlogfuncs.c
> +++ b/src/backend/access/transam/xlogfuncs.c
> @@ -414,7 +414,7 @@ pg_walfile_name_offset(PG_FUNCTION_ARGS)
>     /*
>      * xlogfilename
>      */
> -    XLByteToPrevSeg(locationpoint, xlogsegno, wal_segment_size);
> +    XLByteToSeg(locationpoint, xlogsegno, wal_segment_size);
>     XLogFileName(xlogfilename, GetWALInsertionTimeLine(), xlogsegno,
>                  wal_segment_size);

Kind regards,

Matthias van de Meent



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Failure during Building Postgres in Windows with Meson
Next
From: Andres Freund
Date:
Subject: Re: Improve WALRead() to suck data directly from WAL buffers when possible