Thread: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

Hi,

I was recently asked about converting an offset reported in WAL read
error messages[1] to an LSN with which pg_waldump can be used to
verify the records or WAL file around that LSN (basically one can
filter out the output based on LSN). AFAICS, there's no function that
takes offset as an input and produces an LSN and I ended up figuring
out LSN manually. And, for some of my work too, I was hitting errors
in XLogReaderValidatePageHeader() and adding recptr to those error
messages helped me debug issues faster.

We have a bunch of messages [1] that have an offset, but not LSN in
the error message. Firstly, is there an easiest way to figure out LSN
from offset reported in the error messages? If not, is adding LSN to
these messages along with offset a good idea? Of course, we can't just
convert offset to LSN using XLogSegNoOffsetToRecPtr() and report, but
something meaningful like reporting the LSN of the page that we are
reading-in or writing-out etc.

Thoughts?

[1]
errmsg("could not read from WAL segment %s, offset %u: %m",
errmsg("could not read from WAL segment %s, offset %u: %m",
errmsg("could not write to log file %s "
       "at offset %u, length %zu: %m",
errmsg("unexpected timeline ID %u in WAL segment %s, offset %u",
errmsg("could not read from WAL segment %s, offset %u: read %d of %zu",
pg_log_error("received write-ahead log record for offset %u with no file open",
"invalid magic number %04X in WAL segment %s, offset %u",
"invalid info bits %04X in WAL segment %s, offset %u",
"invalid info bits %04X in WAL segment %s, offset %u",
"unexpected pageaddr %X/%X in WAL segment %s, offset %u",
"out-of-sequence timeline ID %u (after %u) in WAL segment %s, offset %u",

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Mon, Sep 19, 2022 at 07:26:57PM +0530, Bharath Rupireddy wrote:
> We have a bunch of messages [1] that have an offset, but not LSN in
> the error message. Firstly, is there an easiest way to figure out LSN
> from offset reported in the error messages? If not, is adding LSN to
> these messages along with offset a good idea? Of course, we can't just
> convert offset to LSN using XLogSegNoOffsetToRecPtr() and report, but
> something meaningful like reporting the LSN of the page that we are
> reading-in or writing-out etc.

It seems like you want the opposite of pg_walfile_name_offset().  Perhaps
we could add a function like pg_walfile_offset_lsn() that accepts a WAL
file name and byte offset and returns the LSN.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



On Mon, Sep 19, 2022 at 03:16:42PM -0700, Nathan Bossart wrote:
> It seems like you want the opposite of pg_walfile_name_offset().  Perhaps
> we could add a function like pg_walfile_offset_lsn() that accepts a WAL
> file name and byte offset and returns the LSN.

Like so...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment
On 2022-Sep-19, Bharath Rupireddy wrote:

> We have a bunch of messages [1] that have an offset, but not LSN in
> the error message. Firstly, is there an easiest way to figure out LSN
> from offset reported in the error messages? If not, is adding LSN to
> these messages along with offset a good idea? Of course, we can't just
> convert offset to LSN using XLogSegNoOffsetToRecPtr() and report, but
> something meaningful like reporting the LSN of the page that we are
> reading-in or writing-out etc.

Maybe add errcontext() somewhere that reports the LSN would be
appropriate.  For example, the page_read() callbacks have the LSN
readily available, so the ones in backend could install the errcontext
callback; or perhaps ReadPageInternal can do it #ifndef FRONTEND.  Not
sure what is best of those options, but either of those sounds better
than sticking the LSN in a lower-level routine that doesn't necessarily
have the info already.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



On Tue, Sep 20, 2022 at 9:02 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Mon, Sep 19, 2022 at 03:16:42PM -0700, Nathan Bossart wrote:
> > It seems like you want the opposite of pg_walfile_name_offset().  Perhaps
> > we could add a function like pg_walfile_offset_lsn() that accepts a WAL
> > file name and byte offset and returns the LSN.
>
> Like so...

Yeah, something like this will be handy for sure, but I'm not sure if
we want this to be in core. Let's hear from others.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Sep 20, 2022 at 12:57 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2022-Sep-19, Bharath Rupireddy wrote:
>
> > We have a bunch of messages [1] that have an offset, but not LSN in
> > the error message. Firstly, is there an easiest way to figure out LSN
> > from offset reported in the error messages? If not, is adding LSN to
> > these messages along with offset a good idea? Of course, we can't just
> > convert offset to LSN using XLogSegNoOffsetToRecPtr() and report, but
> > something meaningful like reporting the LSN of the page that we are
> > reading-in or writing-out etc.
>
> Maybe add errcontext() somewhere that reports the LSN would be
> appropriate.  For example, the page_read() callbacks have the LSN
> readily available, so the ones in backend could install the errcontext
> callback; or perhaps ReadPageInternal can do it #ifndef FRONTEND.  Not
> sure what is best of those options, but either of those sounds better
> than sticking the LSN in a lower-level routine that doesn't necessarily
> have the info already.

All of the error messages [1] have the LSN from which offset was
calculated, I think we can just append that to the error messages
(something like ".... offset %u, LSN %X/%X: %m") and not complicate
it. Thoughts?

[1]
errmsg("could not read from WAL segment %s, offset %u: %m",
errmsg("could not read from WAL segment %s, offset %u: %m",
errmsg("could not write to log file %s "
       "at offset %u, length %zu: %m",
errmsg("unexpected timeline ID %u in WAL segment %s, offset %u",
errmsg("could not read from WAL segment %s, offset %u: read %d of %zu",
pg_log_error("received write-ahead log record for offset %u with no file open",
"invalid magic number %04X in WAL segment %s, offset %u",
"invalid info bits %04X in WAL segment %s, offset %u",
"invalid info bits %04X in WAL segment %s, offset %u",
"unexpected pageaddr %X/%X in WAL segment %s, offset %u",
"out-of-sequence timeline ID %u (after %u) in WAL segment %s, offset %u",

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



At Tue, 20 Sep 2022 17:40:36 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> On Tue, Sep 20, 2022 at 12:57 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > On 2022-Sep-19, Bharath Rupireddy wrote:
> >
> > > We have a bunch of messages [1] that have an offset, but not LSN in
> > > the error message. Firstly, is there an easiest way to figure out LSN
> > > from offset reported in the error messages? If not, is adding LSN to
> > > these messages along with offset a good idea? Of course, we can't just
> > > convert offset to LSN using XLogSegNoOffsetToRecPtr() and report, but
> > > something meaningful like reporting the LSN of the page that we are
> > > reading-in or writing-out etc.
> >
> > Maybe add errcontext() somewhere that reports the LSN would be
> > appropriate.  For example, the page_read() callbacks have the LSN
> > readily available, so the ones in backend could install the errcontext
> > callback; or perhaps ReadPageInternal can do it #ifndef FRONTEND.  Not
> > sure what is best of those options, but either of those sounds better
> > than sticking the LSN in a lower-level routine that doesn't necessarily
> > have the info already.
> 
> All of the error messages [1] have the LSN from which offset was
> calculated, I think we can just append that to the error messages
> (something like ".... offset %u, LSN %X/%X: %m") and not complicate
> it. Thoughts?

If all error-emitting site knows the LSN, we don't need the context
message. But *I* would like that the additional message looks like
"while reading record at LSN %X/%X" or slightly shorter version of
it. Because the targetRecPtr is the beginning of the current reading
record, not the LSN for the segment and offset. It may point to past
segments.


> [1]
> errmsg("could not read from WAL segment %s, offset %u: %m",
> errmsg("could not read from WAL segment %s, offset %u: %m",
> errmsg("could not write to log file %s "
>        "at offset %u, length %zu: %m",
> errmsg("unexpected timeline ID %u in WAL segment %s, offset %u",
> errmsg("could not read from WAL segment %s, offset %u: read %d of %zu",
> pg_log_error("received write-ahead log record for offset %u with no file open",
> "invalid magic number %04X in WAL segment %s, offset %u",
> "invalid info bits %04X in WAL segment %s, offset %u",
> "invalid info bits %04X in WAL segment %s, offset %u",
> "unexpected pageaddr %X/%X in WAL segment %s, offset %u",
> "out-of-sequence timeline ID %u (after %u) in WAL segment %s, offset %u",


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Tue, Sep 27, 2022 at 8:31 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> If all error-emitting site knows the LSN, we don't need the context
> message. But *I* would like that the additional message looks like
> "while reading record at LSN %X/%X" or slightly shorter version of
> it. Because the targetRecPtr is the beginning of the current reading
> record, not the LSN for the segment and offset. It may point to past
> segments.

I think we could just say "LSN %X/%X, offset %u", because the overall
context whether it's being read or written is implicit with the other
part of the message.

Please see the attached v1 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Thu, Sep 29, 2022 at 7:43 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Please see the attached v1 patch.

FWIW, I'm attaching Nathan's patch that introduced the new function
pg_walfile_offset_lsn as 0002 in the v1 patch set. Here's the CF entry
- https://commitfest.postgresql.org/40/3909/.

Please consider this for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Tue, Oct 4, 2022 at 2:58 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Sep 29, 2022 at 7:43 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Please see the attached v1 patch.
>
> FWIW, I'm attaching Nathan's patch that introduced the new function
> pg_walfile_offset_lsn as 0002 in the v1 patch set. Here's the CF entry
> - https://commitfest.postgresql.org/40/3909/.
>
> Please consider this for further review.

I'm attaching the v2 patch set after rebasing on to the latest HEAD.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
Hi!

I've watched over the patch and consider it useful. Applies without conflicts. The functionality of the patch itself is
meets declared functionality.

But, in my view, some improvements may be proposed. We should be more, let's say, cautious (or a paranoid if you wish),
in pg_walfile_offset_lsn while dealing with user input values. What I mean by that is:
 - to init input vars of sscanf, i.e. tli, log andseg;
 - check the return value of sscanf call;
 - check filename max length.

Another question arises for me: is this function can be called during recovery? If not, we should ereport about this, should we?

And one last note here: pg_walfile_offset_lsn is accepting NULL values and return NULL in this case. From a theoretical
point of view, this is perfectly fine. Actually, I think this is exactly how it supposed to be, but I'm not sure if there are no other opinions here.
--
Best regards,
Maxim Orlov.
Hmm, in 0002, why not return the timeline from the LSN too?  It seems a
waste not to have it.

+        ereport(ERROR,
+                (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+                 errmsg("\"offset\" must not be negative or greater than or "
+                        "equal to WAL segment size")));

I don't think the word offset should be in quotes; and please don't cut
the line.  So I propose

   errmsg("offset must not be negative or greater than or equal to the WAL segment size")));

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



On Fri, Nov 11, 2022 at 5:52 PM Maxim Orlov <orlovmg@gmail.com> wrote:
>
> Hi!
>
> I've watched over the patch and consider it useful. Applies without conflicts. The functionality of the patch itself
is
> meets declared functionality.

Thanks for reviewing.

> But, in my view, some improvements may be proposed. We should be more, let's say, cautious (or a paranoid if you
wish),
> in pg_walfile_offset_lsn while dealing with user input values. What I mean by that is:
>  - to init input vars of sscanf, i.e. tli, log andseg;
>  - check the return value of sscanf call;
>  - check filename max length.

IsXLogFileName() will take care of this. Also, I've added a new inline
function XLogIdFromFileName() that parses file name and returns log
and seg along with XLogSegNo and timeline id. This new function avoids
an extra sscanf() call as well.

> Another question arises for me: is this function can be called during recovery? If not, we should ereport about this,
shouldwe?
 

It's just a utility function and doesn't depend on any of the server's
current state (unlike pg_walfile_name()), hence it doesn't matter if
this function is called during recovery.

> And one last note here: pg_walfile_offset_lsn is accepting NULL values and return NULL in this case. From a
theoretical
> point of view, this is perfectly fine. Actually, I think this is exactly how it supposed to be, but I'm not sure if
thereare no other opinions here.
 

These functions are marked as 'STRICT', meaning a null is returned,
without even calling the function, if any of the input parameters is
null. I think we can keep the behaviour the same as its friends.

On Fri, Nov 11, 2022 at 11:05 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>

Thanks for reviewing.

> Hmm, in 0002, why not return the timeline from the LSN too?  It seems a
> waste not to have it.

Yeah, that actually makes sense. We might be tempted to return segno
too, but it's not something that we emit to the user elsewhere,
whereas we emit timeline id.

> +               ereport(ERROR,
> +                               (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
> +                                errmsg("\"offset\" must not be negative or greater than or "
> +                                               "equal to WAL segment size")));
>
> I don't think the word offset should be in quotes; and please don't cut
> the line.  So I propose
>
>    errmsg("offset must not be negative or greater than or equal to the WAL segment size")));

Changed.

While on this, I noticed that the pg_walfile_name_offset() isn't
covered in tests. I took an opportunity and added a simple test case
along with pg_walfile_offset_lsn().

I'm attaching the v3 patch set for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment


On Tue, 15 Nov 2022 at 13:02, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Fri, Nov 11, 2022 at 5:52 PM Maxim Orlov <orlovmg@gmail.com> wrote:

> But, in my view, some improvements may be proposed. We should be more, let's say, cautious (or a paranoid if you wish),
> in pg_walfile_offset_lsn while dealing with user input values. What I mean by that is:
>  - to init input vars of sscanf, i.e. tli, log andseg;
>  - check the return value of sscanf call;
>  - check filename max length.

IsXLogFileName() will take care of this. Also, I've added a new inline
function XLogIdFromFileName() that parses file name and returns log
and seg along with XLogSegNo and timeline id. This new function avoids
an extra sscanf() call as well.

> Another question arises for me: is this function can be called during recovery? If not, we should ereport about this, should we?

It's just a utility function and doesn't depend on any of the server's
current state (unlike pg_walfile_name()), hence it doesn't matter if
this function is called during recovery.

> And one last note here: pg_walfile_offset_lsn is accepting NULL values and return NULL in this case. From a theoretical
> point of view, this is perfectly fine. Actually, I think this is exactly how it supposed to be, but I'm not sure if there are no other opinions here.

These functions are marked as 'STRICT', meaning a null is returned,
without even calling the function, if any of the input parameters is
null. I think we can keep the behaviour the same as its friends.
 
Thanks for the explanations. I think you are right.
 
>    errmsg("offset must not be negative or greater than or equal to the WAL segment size")));

Changed.
 
Confirm. And a timeline_id is added.
 
While on this, I noticed that the pg_walfile_name_offset() isn't
covered in tests. I took an opportunity and added a simple test case
along with pg_walfile_offset_lsn().

I'm attaching the v3 patch set for further review.
 
Great job! We should mark this patch as RFC, shall we?

--
Best regards,
Maxim Orlov.
On Wed, Nov 16, 2022 at 6:55 PM Maxim Orlov <orlovmg@gmail.com> wrote:
>
>> These functions are marked as 'STRICT', meaning a null is returned,
>> without even calling the function, if any of the input parameters is
>> null. I think we can keep the behaviour the same as its friends.
>
> Thanks for the explanations. I think you are right.
>
> Confirm. And a timeline_id is added.
>>
>> While on this, I noticed that the pg_walfile_name_offset() isn't
>> covered in tests. I took an opportunity and added a simple test case
>> along with pg_walfile_offset_lsn().
>>
>> I'm attaching the v3 patch set for further review.
>
> Great job! We should mark this patch as RFC, shall we?

Please do, if you feel so. Thanks for your review.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Thu, Nov 17, 2022 at 11:53:23AM +0530, Bharath Rupireddy wrote:
> Please do, if you feel so. Thanks for your review.

I don't really mind the addition of the LSN when operating on a given
record where we are reading a location, like in the five error paths
for the header validation or the three ones in ReadRecord()

Now this one looks confusing:
+       XLogSegNoOffsetToRecPtr(openLogSegNo, startoffset,
+                               wal_segment_size, lsn);
        ereport(PANIC,
                (errcode_for_file_access(),
                 errmsg("could not write to log file %s "
-                       "at offset %u, length %zu: %m",
-                       xlogfname, startoffset, nleft)));
+                       "at offset %u, LSN %X/%X, length %zu: %m",
+                       xlogfname, startoffset,
+                       LSN_FORMAT_ARGS(lsn), nleft)));

This does not always refer to an exact LSN of a record as we may be in
the middle of a write, so I would leave it as-is.

Similarly the addition of wre_lsn would be confusing?  The offset
looks kind of enough to me when referring to the middle of a page in
WALReadRaiseError().
--
Michael

Attachment
On Fri, Dec 2, 2022 at 12:50 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Nov 17, 2022 at 11:53:23AM +0530, Bharath Rupireddy wrote:
> > Please do, if you feel so. Thanks for your review.
>
> I don't really mind the addition of the LSN when operating on a given
> record where we are reading a location, like in the five error paths
> for the header validation or the three ones in ReadRecord()

Thanks for reviewing.

> Now this one looks confusing:
> +       XLogSegNoOffsetToRecPtr(openLogSegNo, startoffset,
> +                               wal_segment_size, lsn);
>         ereport(PANIC,
>                 (errcode_for_file_access(),
>                  errmsg("could not write to log file %s "
> -                       "at offset %u, length %zu: %m",
> -                       xlogfname, startoffset, nleft)));
> +                       "at offset %u, LSN %X/%X, length %zu: %m",
> +                       xlogfname, startoffset,
> +                       LSN_FORMAT_ARGS(lsn), nleft)));
>
> This does not always refer to an exact LSN of a record as we may be in
> the middle of a write, so I would leave it as-is.
>
> Similarly the addition of wre_lsn would be confusing?  The offset
> looks kind of enough to me when referring to the middle of a page in
> WALReadRaiseError().

Yes, I removed those changes. Even if someone sees an offset of a
record within a WAL file elsewhere, they have the new utility function
(0002) pg_walfile_offset_lsn().

I'm attaching the v4 patch set for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Sat, Dec 03, 2022 at 09:07:38AM +0530, Bharath Rupireddy wrote:
> Yes, I removed those changes. Even if someone sees an offset of a
> record within a WAL file elsewhere, they have the new utility function
> (0002) pg_walfile_offset_lsn().
>
> I'm attaching the v4 patch set for further review.

+ * Compute an LSN and timline ID given a WAL file name and decimal byte offset.
s/timline/timeline/, exactly two times

+   Datum       values[2] = {0};
+   bool        isnull[2] = {0};
I would not hardcode the number of attributes of the record returned.

Regarding pg_walfile_offset_lsn(), I am not sure that this is the best
move we can do as it is possible to compile a LSN from 0/0 with just a
segment number, say:
select '0/0'::pg_lsn + :segno * setting::int + :offset
  from pg_settings where name = 'wal_segment_size';

+   resultTupleDesc = CreateTemplateTupleDesc(2);
+   TupleDescInitEntry(resultTupleDesc, (AttrNumber) 1, "lsn",
+                      PG_LSNOID, -1, 0);
+   TupleDescInitEntry(resultTupleDesc, (AttrNumber) 2, "timeline_id",
+                      INT4OID, -1, 0);
Let's use get_call_result_type() to get the TupleDesc and to avoid a
duplication between pg_proc.dat and this code.

Hence I would tend to let XLogFromFileName do the job, while having a
SQL function that is just a thin wrapper around it that returns the
segment TLI and its number, leaving the offset out of the equation as
well as this new XLogIdFromFileName().
--
Michael

Attachment
On Mon, Dec 5, 2022 at 6:34 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> Regarding pg_walfile_offset_lsn(), I am not sure that this is the best
> move we can do as it is possible to compile a LSN from 0/0 with just a
> segment number, say:
> select '0/0'::pg_lsn + :segno * setting::int + :offset
>   from pg_settings where name = 'wal_segment_size';

Nice.

> +   resultTupleDesc = CreateTemplateTupleDesc(2);
> +   TupleDescInitEntry(resultTupleDesc, (AttrNumber) 1, "lsn",
> +                      PG_LSNOID, -1, 0);
> +   TupleDescInitEntry(resultTupleDesc, (AttrNumber) 2, "timeline_id",
> +                      INT4OID, -1, 0);
> Let's use get_call_result_type() to get the TupleDesc and to avoid a
> duplication between pg_proc.dat and this code.
>
> Hence I would tend to let XLogFromFileName do the job, while having a
> SQL function that is just a thin wrapper around it that returns the
> segment TLI and its number, leaving the offset out of the equation as
> well as this new XLogIdFromFileName().

So, a SQL function pg_dissect_walfile_name (or some other better name)
given a WAL file name returns the tli and seg number. Then the
pg_walfile_offset_lsn can just be a SQL-defined function (in
system_functions.sql) using this new function and pg_settings. If this
understanding is correct, it looks good to me at this point.

That said, let's also hear from others.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Mon, Dec 05, 2022 at 08:48:25AM +0530, Bharath Rupireddy wrote:
> So, a SQL function pg_dissect_walfile_name (or some other better name)
> given a WAL file name returns the tli and seg number. Then the
> pg_walfile_offset_lsn can just be a SQL-defined function (in
> system_functions.sql) using this new function and pg_settings. If this
> understanding is correct, it looks good to me at this point.

I would do without the SQL function that looks at pg_settings, FWIW.

> That said, let's also hear from others.

Sure.  Perhaps my set of suggestions will not get the majority,
though..
--
Michael

Attachment
At Mon, 5 Dec 2022 13:13:23 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Mon, Dec 05, 2022 at 08:48:25AM +0530, Bharath Rupireddy wrote:
> > So, a SQL function pg_dissect_walfile_name (or some other better name)
> > given a WAL file name returns the tli and seg number. Then the
> > pg_walfile_offset_lsn can just be a SQL-defined function (in
> > system_functions.sql) using this new function and pg_settings. If this
> > understanding is correct, it looks good to me at this point.
> 
> I would do without the SQL function that looks at pg_settings, FWIW.

If that function may be called at a high frequency, SQL-defined one is
not suitable, but I don't think this function is used that way.  With
that premise, I would prefer SQL-defined as it is far simpler on its
face.

At Mon, 5 Dec 2022 10:03:55 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> Hence I would tend to let XLogFromFileName do the job, while having a
> SQL function that is just a thin wrapper around it that returns the
> segment TLI and its number, leaving the offset out of the equation as
> well as this new XLogIdFromFileName().

I don't think it could be problematic that the SQL-callable function
returns a bogus result for a wrong WAL filename in the correct
format. Specifically, I think that the function may return (0/0,0) for
"000000000000000000000000" since that behavior is completely
harmless. If we don't check logid, XLogFromFileName fits instead.

(If we assume that the file names are typed in letter-by-letter, I
 rather prefer to allow lower-case letters:p)

> > That said, let's also hear from others.
> 
> Sure.  Perhaps my set of suggestions will not get the majority,
> though..


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Tue, Dec 6, 2022 at 12:57 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Mon, 5 Dec 2022 13:13:23 +0900, Michael Paquier <michael@paquier.xyz> wrote in
> > On Mon, Dec 05, 2022 at 08:48:25AM +0530, Bharath Rupireddy wrote:
> > > So, a SQL function pg_dissect_walfile_name (or some other better name)
> > > given a WAL file name returns the tli and seg number. Then the
> > > pg_walfile_offset_lsn can just be a SQL-defined function (in
> > > system_functions.sql) using this new function and pg_settings. If this
> > > understanding is correct, it looks good to me at this point.
> >
> > I would do without the SQL function that looks at pg_settings, FWIW.
>
> If that function may be called at a high frequency, SQL-defined one is
> not suitable, but I don't think this function is used that way.  With
> that premise, I would prefer SQL-defined as it is far simpler on its
> face.
>
> At Mon, 5 Dec 2022 10:03:55 +0900, Michael Paquier <michael@paquier.xyz> wrote in
> > Hence I would tend to let XLogFromFileName do the job, while having a
> > SQL function that is just a thin wrapper around it that returns the
> > segment TLI and its number, leaving the offset out of the equation as
> > well as this new XLogIdFromFileName().
>
> I don't think it could be problematic that the SQL-callable function
> returns a bogus result for a wrong WAL filename in the correct
> format. Specifically, I think that the function may return (0/0,0) for
> "000000000000000000000000" since that behavior is completely
> harmless. If we don't check logid, XLogFromFileName fits instead.

If we were to provide correctness and input invalidations
(specifically, the validations around 'seg', see below) of the WAL
file name typed in by the user, the function pg_walfile_offset_lsn()
wins the race.

+    XLogIdFromFileName(fname, &tli, &segno, &log, &seg, wal_segment_size);
+
+    if (seg >= XLogSegmentsPerXLogId(wal_segment_size) ||
+        (log == 0 && seg == 0) ||
+        segno == 0 ||
+        tli == 0)
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("invalid WAL file name \"%s\"", fname)));

+SELECT * FROM pg_walfile_offset_lsn('0000000100000000FFFFFFFF', 15);
+ERROR:  invalid WAL file name "0000000100000000FFFFFFFF"

That said, I think we can have a single function
pg_dissect_walfile_name(wal_file_name, offset optional) returning
segno (XLogSegNo - physical log file sequence number), tli, lsn (if
offset is given). This way there is no need for another SQL-callable
function using pg_settings. Thoughts?

> (If we assume that the file names are typed in letter-by-letter, I
>  rather prefer to allow lower-case letters:p)

It's easily doable if we convert the entered WAL file name to
uppercase using pg_toupper() and then pass it to IsXLogFileName().

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Dec 6, 2022 at 4:46 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> That said, I think we can have a single function
> pg_dissect_walfile_name(wal_file_name, offset optional) returning
> segno (XLogSegNo - physical log file sequence number), tli, lsn (if
> offset is given). This way there is no need for another SQL-callable
> function using pg_settings. Thoughts?
>
> > (If we assume that the file names are typed in letter-by-letter, I
> >  rather prefer to allow lower-case letters:p)
>
> It's easily doable if we convert the entered WAL file name to
> uppercase using pg_toupper() and then pass it to IsXLogFileName().

Okay, here's the v5 patch that I could come up with. It basically adds
functions for dissecting WAL file names and computing offset from lsn.
Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Tue, Dec 06, 2022 at 04:27:50PM +0900, Kyotaro Horiguchi wrote:
> At Mon, 5 Dec 2022 10:03:55 +0900, Michael Paquier <michael@paquier.xyz> wrote in
>> Hence I would tend to let XLogFromFileName do the job, while having a
>> SQL function that is just a thin wrapper around it that returns the
>> segment TLI and its number, leaving the offset out of the equation as
>> well as this new XLogIdFromFileName().
>
> I don't think it could be problematic that the SQL-callable function
> returns a bogus result for a wrong WAL filename in the correct
> format. Specifically, I think that the function may return (0/0,0) for
> "000000000000000000000000" since that behavior is completely
> harmless. If we don't check logid, XLogFromFileName fits instead.

Yeah, I really don't think that it is a big deal either:
XLogIdFromFileName() just translates what it receives from the
caller for the TLI and the segment number.

> (If we assume that the file names are typed in letter-by-letter, I
>  rather prefer to allow lower-case letters:p)

Yep, makes sense to enforce a compatible WAL segment name if we can.
--
Michael

Attachment
On Tue, Dec 13, 2022 at 09:32:19PM +0530, Bharath Rupireddy wrote:
> Okay, here's the v5 patch that I could come up with. It basically adds
> functions for dissecting WAL file names and computing offset from lsn.
> Thoughts?

I had a second look at that, and I still have mixed feelings about the
addition of the SQL function, no real objection about
pg_dissect_walfile_name().

I don't really think that we need a specific handling with a new
macro from xlog_internal.h that does its own parsing of the segment
number while XLogFromFileName() can do that based on the user input,
so I have simplified that.

A second thing is the TLI that had better be returned as int8 and not
int4 so as we don't have a negative number for a TLI higher than 2B in
a WAL segment name.
--
Michael

Attachment
On Mon, Dec 19, 2022 at 1:37 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Dec 13, 2022 at 09:32:19PM +0530, Bharath Rupireddy wrote:
> > Okay, here's the v5 patch that I could come up with. It basically adds
> > functions for dissecting WAL file names and computing offset from lsn.
> > Thoughts?
>
> I had a second look at that, and I still have mixed feelings about the
> addition of the SQL function, no real objection about
> pg_dissect_walfile_name().
>
> I don't really think that we need a specific handling with a new
> macro from xlog_internal.h that does its own parsing of the segment
> number while XLogFromFileName() can do that based on the user input,
> so I have simplified that.
>
> A second thing is the TLI that had better be returned as int8 and not
> int4 so as we don't have a negative number for a TLI higher than 2B in
> a WAL segment name.

Thanks. The v6 patch LGTM.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Mon, Dec 19, 2022 at 5:22 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Dec 19, 2022 at 1:37 PM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Tue, Dec 13, 2022 at 09:32:19PM +0530, Bharath Rupireddy wrote:
> > > Okay, here's the v5 patch that I could come up with. It basically adds
> > > functions for dissecting WAL file names and computing offset from lsn.
> > > Thoughts?
> >
> > I had a second look at that, and I still have mixed feelings about the
> > addition of the SQL function, no real objection about
> > pg_dissect_walfile_name().
> >
> > I don't really think that we need a specific handling with a new
> > macro from xlog_internal.h that does its own parsing of the segment
> > number while XLogFromFileName() can do that based on the user input,
> > so I have simplified that.
> >
> > A second thing is the TLI that had better be returned as int8 and not
> > int4 so as we don't have a negative number for a TLI higher than 2B in
> > a WAL segment name.
>
> Thanks. The v6 patch LGTM.

A nitpick - can we also specify a use case for the function
pg_dissect_walfile_name(), that is, computing LSN from offset and WAL
file name, something like [1]?

[1]
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2f05b06f14..c36fcb83c8 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26110,7 +26110,17 @@ LOG:  Grand total: 1651920 bytes in 201
blocks; 622360 free (88 chunks); 1029560
        </para>
        <para>
         Extract the file sequence number and timeline ID from a WAL file
-        name.
+        name. This function is useful to compute LSN from a given offset
+        and WAL file name, for example:
+<screen>
+postgres=# \set file_name '000000010000000100C000AB'
+postgres=# \set offset 256
+postgres=# SELECT '0/0'::pg_lsn + pd.segno * ps.setting::int +
:offset AS lsn FROM pg_dissect_walfile_name(:'file_name') pd,
pg_show_all_settings() ps WHERE ps.name = 'wal_segment_size';
+      lsn
+---------------
+ C001/AB000100
+(1 row)
+</screen>
        </para></entry>
       </row>

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Mon, Dec 19, 2022 at 05:51:19PM +0530, Bharath Rupireddy wrote:
> A nitpick - can we also specify a use case for the function
> pg_dissect_walfile_name(), that is, computing LSN from offset and WAL
> file name, something like [1]?

Yeah, my mind was considering as well yesterday the addition of a note
in the docs about something among these lines, so fine by me.
--
Michael

Attachment
On Tue, Dec 20, 2022 at 09:01:02AM +0900, Michael Paquier wrote:
> Yeah, my mind was considering as well yesterday the addition of a note
> in the docs about something among these lines, so fine by me.

And applied that, after tweaking a few tiny things on a last lookup
with a catversion bump.  Note that the example has been moved at the
bottom of the table for these functions, which is more consistent with
the surroundings.
--
Michael

Attachment


On Tue, Dec 20, 2022 at 5:40 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Dec 20, 2022 at 09:01:02AM +0900, Michael Paquier wrote:
> Yeah, my mind was considering as well yesterday the addition of a note
> in the docs about something among these lines, so fine by me.

And applied that, after tweaking a few tiny things on a last lookup
with a catversion bump.  Note that the example has been moved at the
bottom of the table for these functions, which is more consistent with
the surroundings.


Hi!

Caught this thread late. To me, pg_dissect_walfile_name() is a really strange name for a function. Grepping our I code I see the term dissect s used somewhere inside the regex code and exactly zero instances elsewhere. Which is why I definitely didn't recognize the term...

Wouldn't something like pg_split_walfile_name() be a lot more consistent with the rest of our names?

//Magnus

On Tue, Dec 20, 2022 at 1:27 PM Magnus Hagander <magnus@hagander.net> wrote:
>
> On Tue, Dec 20, 2022 at 5:40 AM Michael Paquier <michael@paquier.xyz> wrote:
>>
>> On Tue, Dec 20, 2022 at 09:01:02AM +0900, Michael Paquier wrote:
>> > Yeah, my mind was considering as well yesterday the addition of a note
>> > in the docs about something among these lines, so fine by me.
>>
>> And applied that, after tweaking a few tiny things on a last lookup
>> with a catversion bump.  Note that the example has been moved at the
>> bottom of the table for these functions, which is more consistent with
>> the surroundings.
>>
>
> Hi!
>
> Caught this thread late. To me, pg_dissect_walfile_name() is a really strange name for a function. Grepping our I
codeI see the term dissect s used somewhere inside the regex code and exactly zero instances elsewhere. Which is why I
definitelydidn't recognize the term... 
>
> Wouldn't something like pg_split_walfile_name() be a lot more consistent with the rest of our names?

Hm. FWIW, here's the patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
2022年12月20日(火) 21:35 Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>:
>
> On Tue, Dec 20, 2022 at 1:27 PM Magnus Hagander <magnus@hagander.net> wrote:
> >
> > On Tue, Dec 20, 2022 at 5:40 AM Michael Paquier <michael@paquier.xyz> wrote:
> >>
> >> On Tue, Dec 20, 2022 at 09:01:02AM +0900, Michael Paquier wrote:
> >> > Yeah, my mind was considering as well yesterday the addition of a note
> >> > in the docs about something among these lines, so fine by me.
> >>
> >> And applied that, after tweaking a few tiny things on a last lookup
> >> with a catversion bump.  Note that the example has been moved at the
> >> bottom of the table for these functions, which is more consistent with
> >> the surroundings.
> >>
> >
> > Hi!
> >
> > Caught this thread late. To me, pg_dissect_walfile_name() is a really strange name for a function. Grepping our I
codeI see the term dissect s used somewhere inside the regex code and exactly zero instances elsewhere. Which is why I
definitelydidn't recognize the term... 

Late to the party too, but I did wonder about the name when I saw it.

> > Wouldn't something like pg_split_walfile_name() be a lot more consistent with the rest of our names?
>
> Hm. FWIW, here's the patch.

Hmm, "pg_split_walfile_name()" sounds like it would return three 8
character strings.

Maybe something like "pg_walfile_name_elements()" ?

Regards

Ian Barwick



On Tue, Dec 20, 2022 at 06:04:40PM +0530, Bharath Rupireddy wrote:
> On Tue, Dec 20, 2022 at 1:27 PM Magnus Hagander <magnus@hagander.net> wrote:
>> Caught this thread late. To me, pg_dissect_walfile_name() is a
>> really strange name for a function. Grepping our I code I see the
>> term dissect s used somewhere inside the regex code and exactly
>> zero instances elsewhere. Which is why I definitely didn't
>> recognize the term...
>>
>> Wouldn't something like pg_split_walfile_name() be a lot more
>> consistent with the rest of our names?

Fine by me to change that if there is little support for the current
naming, though the current one does not sound that bad to me either.

> Hm. FWIW, here's the patch.

"split" is used a lot for the picksplit functions, but not in any of
the existing functions as a name.  Some extra options: parse, read,
extract, calculate, deduce, get.  "parse" would be something I would
be OK with.
--
Michael

Attachment
On Wed, Dec 21, 2022 at 5:39 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Dec 20, 2022 at 06:04:40PM +0530, Bharath Rupireddy wrote:
> > On Tue, Dec 20, 2022 at 1:27 PM Magnus Hagander <magnus@hagander.net> wrote:
> >> Caught this thread late. To me, pg_dissect_walfile_name() is a
> >> really strange name for a function. Grepping our I code I see the
> >> term dissect s used somewhere inside the regex code and exactly
> >> zero instances elsewhere. Which is why I definitely didn't
> >> recognize the term...
> >>
> >> Wouldn't something like pg_split_walfile_name() be a lot more
> >> consistent with the rest of our names?
>
> Fine by me to change that if there is little support for the current
> naming, though the current one does not sound that bad to me either.
>
> > Hm. FWIW, here's the patch.
>
> "split" is used a lot for the picksplit functions, but not in any of
> the existing functions as a name.  Some extra options: parse, read,
> extract, calculate, deduce, get.  "parse" would be something I would
> be OK with.

"dissect", "split" and "parse" -  I'm okay with either of these.

Read somewhere - a saying that goes this way "the hardest part of
coding is to name variables and functions" :).

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Wed, Dec 21, 2022 at 1:09 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Dec 20, 2022 at 06:04:40PM +0530, Bharath Rupireddy wrote:
> On Tue, Dec 20, 2022 at 1:27 PM Magnus Hagander <magnus@hagander.net> wrote:
>> Caught this thread late. To me, pg_dissect_walfile_name() is a
>> really strange name for a function. Grepping our I code I see the
>> term dissect s used somewhere inside the regex code and exactly
>> zero instances elsewhere. Which is why I definitely didn't
>> recognize the term...
>>
>> Wouldn't something like pg_split_walfile_name() be a lot more
>> consistent with the rest of our names?

Fine by me to change that if there is little support for the current
naming, though the current one does not sound that bad to me either.

> Hm. FWIW, here's the patch.

"split" is used a lot for the picksplit functions, but not in any of
the existing functions as a name.  Some extra options: parse, read,
extract, calculate, deduce, get.  "parse" would be something I would
be OK with.


Not sure what you mean? We certainly have a lot of functions called split that are not the picksplit ones. split_part(). regexp_split_to_array(), regexp_split_to_table()... And ther'es things like tuiple_data_split() in pageinspect.

There are many other examples outside of postgres as well, e.g. python has a split() of pathnames, "almost every language" has a split() on strings etc. I don't think I've ever seen dissect in a place like that either (though Im sure it exists somewhere, it's hardly common)

Basically, we take one thing and turn it into 3. That very naturally rings with "split" to me.

Parse might work as well, certainly better than dissect. I'd still prefer split though.

--
On Wed, Dec 21, 2022 at 10:22:02PM +0100, Magnus Hagander wrote:
> Basically, we take one thing and turn it into 3. That very naturally rings
> with "split" to me.
>
> Parse might work as well, certainly better than dissect. I'd still prefer
> split though.

Honestly, I don't have any counter-arguments, so I am fine to switch
the name as you are suggesting.  And pg_split_walfile_name() it is?
--
Michael

Attachment
On Thu, Dec 22, 2022 at 7:57 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Dec 21, 2022 at 10:22:02PM +0100, Magnus Hagander wrote:
> > Basically, we take one thing and turn it into 3. That very naturally rings
> > with "split" to me.
> >
> > Parse might work as well, certainly better than dissect. I'd still prefer
> > split though.
>
> Honestly, I don't have any counter-arguments, so I am fine to switch
> the name as you are suggesting.  And pg_split_walfile_name() it is?

+1. FWIW, a simple patch is here
https://www.postgresql.org/message-id/CALj2ACXdZ7WGRD-_jPPeZugvWLN%2Bgxo3QtV-eZPRicUwjesM%3Dg%40mail.gmail.com.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



At Thu, 22 Dec 2022 10:09:23 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> On Thu, Dec 22, 2022 at 7:57 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Wed, Dec 21, 2022 at 10:22:02PM +0100, Magnus Hagander wrote:
> > > Basically, we take one thing and turn it into 3. That very naturally rings
> > > with "split" to me.
> > >
> > > Parse might work as well, certainly better than dissect. I'd still prefer
> > > split though.
> >
> > Honestly, I don't have any counter-arguments, so I am fine to switch
> > the name as you are suggesting.  And pg_split_walfile_name() it is?
> 
> +1. FWIW, a simple patch is here
> https://www.postgresql.org/message-id/CALj2ACXdZ7WGRD-_jPPeZugvWLN%2Bgxo3QtV-eZPRicUwjesM%3Dg%40mail.gmail.com.

By the way the function is documented as the follows.

>  Extracts the file sequence number and timeline ID from a WAL file name.

I didn't find the definition for the workd "file sequence number" in
the doc. Instead I find "segment number" (a bit doubtful, though..).

In the first place "file sequence number" and "segno" can hardly be
associated by appearance by readers, I think.  (Yeah, we can identify
that since the another parameter is identifiable alone.) Why don't we
spell out the parameter simply as "segment number"?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Thu, Dec 22, 2022 at 05:19:24PM +0900, Kyotaro Horiguchi wrote:
> In the first place "file sequence number" and "segno" can hardly be
> associated by appearance by readers, I think.  (Yeah, we can identify
> that since the another parameter is identifiable alone.) Why don't we
> spell out the parameter simply as "segment number"?

As in using "sequence number" removing "file" from the docs and
changing the OUT parameter name to segment_number rather than segno?
Fine by me.
--
Michael

Attachment
On Thu, Dec 22, 2022 at 4:57 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Dec 22, 2022 at 05:19:24PM +0900, Kyotaro Horiguchi wrote:
> > In the first place "file sequence number" and "segno" can hardly be
> > associated by appearance by readers, I think.  (Yeah, we can identify
> > that since the another parameter is identifiable alone.) Why don't we
> > spell out the parameter simply as "segment number"?
>
> As in using "sequence number" removing "file" from the docs and
> changing the OUT parameter name to segment_number rather than segno?
> Fine by me.

+1.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Thu, Dec 22, 2022 at 05:03:35PM +0530, Bharath Rupireddy wrote:
> On Thu, Dec 22, 2022 at 4:57 PM Michael Paquier <michael@paquier.xyz> wrote:
>> As in using "sequence number" removing "file" from the docs and
>> changing the OUT parameter name to segment_number rather than segno?
>> Fine by me.
>
> +1.

Okay, done this way.
--
Michael

Attachment
On Fri, Dec 23, 2022 at 2:06 AM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Dec 22, 2022 at 05:03:35PM +0530, Bharath Rupireddy wrote:
> On Thu, Dec 22, 2022 at 4:57 PM Michael Paquier <michael@paquier.xyz> wrote:
>> As in using "sequence number" removing "file" from the docs and
>> changing the OUT parameter name to segment_number rather than segno?
>> Fine by me.
>
> +1.

Okay, done this way.


Thanks!

//Magnus