Re: pg_walfile_name_offset can return inconsistent values - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: pg_walfile_name_offset can return inconsistent values
Date
Msg-id ZU0xZuARuDbpUMf2@momjian.us
Whole thread Raw
In response to Re: pg_walfile_name_offset can return inconsistent values  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
Responses Re: pg_walfile_name_offset can return inconsistent values
List pgsql-hackers
On Fri, Jul 26, 2019 at 11:30:19AM +0200, Jehan-Guillaume de Rorthais wrote:
> On Fri, 26 Jul 2019 17:21:20 +0900 (Tokyo Standard Time)
> Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> 
> > Hello.
> > 
> > While looking [1], I noticed that pg_walfile_name_offset behaves
> > somewhat oddly at segment boundary.
> > 
> > select * from (values ('0/16ffffff'), ('0/17000000'), ('0/17000001')) as
> > t(lsn), lateral pg_walfile_name_offset(lsn::pg_lsn);
> >     lsn     |        file_name         | file_offset
> > ------------+--------------------------+-------------
> >  0/16ffffff | 000000020000000000000016 |    16777215
> >  0/17000000 | 000000020000000000000016 |           0
> >  0/17000001 | 000000020000000000000017 |           1
> > 
> > 
> > The file names are right as defined, but the return value of the
> > second line wrong, or at least misleading.
> 
> +1
> I noticed it as well and put this report on hold while working on my patch.
> Thanks for reporting this!
> 
> > It should be (16, 1000000) or (16, FFFFFF). The former is out-of-domain so we
> > would have no way than choosing the latter. I'm not sure the purpose of
> > the second output parameter, thus the former might be right
> > decision.
> >
> > The function returns the following result after this patch is
> > applied.
> > 
> > select * from (values ('0/16ffffff'), ('0/17000000'), ('0/17000001')) as
> > t(lsn), lateral pg_walfile_name_offset(lsn::pg_lsn);
> >     lsn     |        file_name         | file_offset
> > ------------+--------------------------+-------------
> >  0/16ffffff | 000000020000000000000016 |    16777214
> >  0/17000000 | 000000020000000000000016 |    16777215
> >  0/17000001 | 000000020000000000000017 |           0
> 
> So you shift the file offset for all LSN by one byte? This could lead to
> regression in various tools relying on this function.
> 
> Moreover, it looks weird as the LSN doesn't reflect the given offset anymore
> (FFFFFF <> 16777214, 000001 <> 0, etc).
> 
> Another solution might be to return the same result when for both 0/16ffffff and
> 0/17000000, but it doesn't feel right either.
> 
> So in fact, returning 0x1000000 seems to be the cleaner result to me.

I know this bug report is four years old, but it is still a
pg_walfile_name_offset() bug.  Here is the bug:

    SELECT *
    FROM (VALUES ('0/16ffffff'), ('0/17000000'), ('0/17000001')) AS t(lsn), 
         LATERAL pg_walfile_name_offset(lsn::pg_lsn);

        lsn     |        file_name         | file_offset
    ------------+--------------------------+-------------
     0/16ffffff | 000000010000000000000016 |    16777215
-->     0/17000000 | 000000010000000000000016 |           0
     0/17000001 | 000000010000000000000017 |           1

The bug is in the indicated line --- it shows the filename as 00016 but
offset as zero, when clearly the LSN is pointing to 17/0.  The bug is
essentially that the code for pg_walfile_name_offset() uses the exact
offset from the LSN, but uses the file name from the previous byte of
the LSN.

The fix involves deciding what the description or purpose of
pg_walfile_name_offset() means, and adjusting it to be clearer.  The
current documentation says:

    Converts a write-ahead log location to a WAL file name and byte
    offset within that file.

Fix #1:  If we assume write-ahead log location means LSN, it is saying
show the file/offset of the LSN, and that is most clearly:

        lsn     |        file_name         | file_offset
    ------------+--------------------------+-------------
     0/16ffffff | 000000010000000000000016 |    16777215
     0/17000000 | 000000010000000000000017 |           0
     0/17000001 | 000000010000000000000017 |           1

Fix #2:  Now, there are some who have said they want the output to be
the last written WAL byte (the byte before the LSN), not the current
LSN, for archiving purposes.  However, if we do that, we have to update
the docs to clarify it.  Its output would be:

        lsn     |        file_name         | file_offset
    ------------+--------------------------+-------------
     0/16ffffff | 000000010000000000000016 |    16777214
     0/17000000 | 000000010000000000000016 |    16777215
     0/17000001 | 000000010000000000000017 |           0

The email thread also considered having the second row offset be 16777216
(2^24), which is not a valid offset for a file if we assume a zero-based
offset.

Looking further, pg_walfile_name() also returns the filename for the
previous byte:

    SELECT *
    FROM (values ('0/16ffffff'), ('0/17000000'), ('0/17000001')) as t(lsn),
         LATERAL pg_walfile_name_offset(lsn::pg_lsn), LATERAL pg_walfile_name(lsn::pg_lsn);
        lsn     |        file_name         | file_offset |     pg_walfile_name
    ------------+--------------------------+-------------+--------------------------
     0/16ffffff | 000000010000000000000016 |    16777215 | 000000010000000000000016
     0/17000000 | 000000010000000000000016 |           0 | 000000010000000000000016
     0/17000001 | 000000010000000000000017 |           1 | 000000010000000000000017

I have attached fix #1 as offset1.diff and fix #2 as offset2.diff.

I think the most logical fix is #1, but pg_walfile_name() would need to
be modified.  If the previous file/byte offset are what is desired, fix
#2 will need doc changes for both functions.  This probably needs to be
documented as a backward incompatibility for either fix.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Fix output of zero privileges in psql
Next
From: Andrew Dunstan
Date:
Subject: Re: Failure during Building Postgres in Windows with Meson