Thread: Use XLogFromFileName() in pg_resetwal to parse position from WAL file

Use XLogFromFileName() in pg_resetwal to parse position from WAL file

From
Bharath Rupireddy
Date:
Hi,

It looks like there's an opportunity to replace explicit WAL file
parsing code with XLogFromFileName() in pg_resetwal.c. This was not
done then (in PG 10) because the XLogFromFileName() wasn't accepting
file size as an input parameter (see [1]) and pg_resetwal needed to
use WAL file size from the controlfile. Thanks to the commit
fc49e24fa69a15efacd5b8958115ed9c43c48f9a which added the
wal_segsz_bytes parameter to XLogFromFileName().

I'm attaching a small patch herewith. This removes using extra
variables in pg_resetwal.c and a bit of duplicate code too (5 LOC).

Thoughts?

[1] https://github.com/postgres/postgres/blob/REL_10_STABLE/src/include/access/xlog_internal.h

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Use XLogFromFileName() in pg_resetwal to parse position from WAL file

From
Kyotaro Horiguchi
Date:
At Tue, 4 Oct 2022 11:06:15 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> It looks like there's an opportunity to replace explicit WAL file
> parsing code with XLogFromFileName() in pg_resetwal.c. This was not
> done then (in PG 10) because the XLogFromFileName() wasn't accepting
> file size as an input parameter (see [1]) and pg_resetwal needed to
> use WAL file size from the controlfile. Thanks to the commit
> fc49e24fa69a15efacd5b8958115ed9c43c48f9a which added the
> wal_segsz_bytes parameter to XLogFromFileName().

Nice finding. I found a few '%08X%08X's but they don't seem to fit
similar fix.

> I'm attaching a small patch herewith. This removes using extra
> variables in pg_resetwal.c and a bit of duplicate code too (5 LOC).
> 
> Thoughts?

> -    segs_per_xlogid = (UINT64CONST(0x0000000100000000) / ControlFile.xlog_seg_size);
>      newXlogSegNo = ControlFile.checkPointCopy.redo / ControlFile.xlog_seg_size;

Couldn't we use XLByteToSeg() here?

Other than that, it looks good to me.

> [1] https://github.com/postgres/postgres/blob/REL_10_STABLE/src/include/access/xlog_internal.h

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Use XLogFromFileName() in pg_resetwal to parse position from WAL file

From
Bharath Rupireddy
Date:
On Tue, Oct 4, 2022 at 11:47 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> > -     segs_per_xlogid = (UINT64CONST(0x0000000100000000) / ControlFile.xlog_seg_size);
> >       newXlogSegNo = ControlFile.checkPointCopy.redo / ControlFile.xlog_seg_size;
>
> Couldn't we use XLByteToSeg() here?

Yes, we could.

> Other than that, it looks good to me.

Thanks. There are a few more assorted WAL file related things I found,
I will be sending all of them in this thread itself in a while.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Use XLogFromFileName() in pg_resetwal to parse position from WAL file

From
Kyotaro Horiguchi
Date:
At Tue, 04 Oct 2022 15:17:06 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> Other than that, it looks good to me.

Sorry I have another comment.

> -            unsigned int tli,
> -                        log,
> -                        seg;
> +            unsigned int tli;
>              XLogSegNo    segno;

TLI should be of type TimeLineID.

This is not directly related to this patch, pg_resetwal.c has the
following line..

> static uint32 minXlogTli = 0;

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Use XLogFromFileName() in pg_resetwal to parse position from WAL file

From
Michael Paquier
Date:
On Tue, Oct 04, 2022 at 03:17:06PM +0900, Kyotaro Horiguchi wrote:
> Nice finding. I found a few '%08X%08X's but they don't seem to fit
> similar fix.

Nice cleanup.

> Couldn't we use XLByteToSeg() here?
>
> Other than that, it looks good to me.

Yep.  It looks that you're right here.
--
Michael

Attachment

Re: Use XLogFromFileName() in pg_resetwal to parse position from WAL file

From
Kyotaro Horiguchi
Date:
At Tue, 04 Oct 2022 15:23:48 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> This is not directly related to this patch, pg_resetwal.c has the
> following line..
> 
> > static uint32 minXlogTli = 0;

I have found other three instances of this in xlog.c and
pg_receivewal.c.  Do they worth fixing?

(pg_upgarade.c has "uint32 tli/logid/segno but I'm not sure they need
 to be "fixed". At least the segno is not a XLogSegNo.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Tue, Oct 4, 2022 at 12:11 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> >
> > > static uint32 minXlogTli = 0;
>
> I have found other three instances of this in xlog.c and
> pg_receivewal.c.  Do they worth fixing?
>
> (pg_upgarade.c has "uint32 tli/logid/segno but I'm not sure they need
>  to be "fixed". At least the segno is not a XLogSegNo.)

There are quite a number of places where data types need to be fixed,
see XLogFileNameById() callers. They are all being parsed as uint32
and then used. I'm not sure if we want to fix all of them.

I think I found that we can fix/refactor few WAL file related things:

1. 0001 replaces explicit WAL file parsing code with
XLogFromFileName() and uses XLByteToSeg() in pg_resetwal.c. This was
not done then (in PG 10) because the XLogFromFileName() wasn't
accepting file size as an input parameter and pg_resetwal needed to
use WAL file size from the controlfile. Thanks to the commit
fc49e24fa69a15efacd5b8958115ed9c43c48f9a which added the
wal_segsz_bytes parameter to XLogFromFileName(). This removes using
extra variables in pg_resetwal.c and a bit of duplicate code too. It
also replaces the explicit code with the XLByteToSeg() macro.

2. 0002 replaces MAXPGPATH with MAXFNAMELEN for WAL file names.
MAXFNAMELEN (64 bytes) is typically meant to be used for all WAL file
names across the code base. Because the WAL file names in postgres
can't be bigger than 64 bytes, in fact, not more than XLOG_FNAME_LEN
(24 bytes) but there are suffixes, timeline history files etc. To
accommodate all of that MAXFNAMELEN is introduced. There are some
places in the code base that still use MAXPGPATH (1024 bytes) for WAL
file names which is an unnecessary wastage of stack memory. This makes
code consistent across and saves a bit of space.

3. 0003 replaces WAL file name calculation with XLogFileNameById() in
pg_upgrade/controldata.c to be consistent across the code base. Note
that this requires us to change the nextxlogfile size from hard-coded
25 bytes to MAXFNAMELEN (64 bytes).

I'm attaching the v2 patch set.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
At Tue, 4 Oct 2022 13:20:54 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> On Tue, Oct 4, 2022 at 12:11 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> >
> > >
> > > > static uint32 minXlogTli = 0;
> >
> > I have found other three instances of this in xlog.c and
> > pg_receivewal.c.  Do they worth fixing?
> >
> > (pg_upgarade.c has "uint32 tli/logid/segno but I'm not sure they need
> >  to be "fixed". At least the segno is not a XLogSegNo.)
> 
> There are quite a number of places where data types need to be fixed,
> see XLogFileNameById() callers. They are all being parsed as uint32
> and then used. I'm not sure if we want to fix all of them.
> 
> I think I found that we can fix/refactor few WAL file related things:
> 
> 1. 0001 replaces explicit WAL file parsing code with
> XLogFromFileName() and uses XLByteToSeg() in pg_resetwal.c. This was
> not done then (in PG 10) because the XLogFromFileName() wasn't
> accepting file size as an input parameter and pg_resetwal needed to
> use WAL file size from the controlfile. Thanks to the commit
> fc49e24fa69a15efacd5b8958115ed9c43c48f9a which added the
> wal_segsz_bytes parameter to XLogFromFileName(). This removes using
> extra variables in pg_resetwal.c and a bit of duplicate code too. It
> also replaces the explicit code with the XLByteToSeg() macro.

Looks good to me.

> 2. 0002 replaces MAXPGPATH with MAXFNAMELEN for WAL file names.
> MAXFNAMELEN (64 bytes) is typically meant to be used for all WAL file
> names across the code base. Because the WAL file names in postgres
> can't be bigger than 64 bytes, in fact, not more than XLOG_FNAME_LEN
> (24 bytes) but there are suffixes, timeline history files etc. To
> accommodate all of that MAXFNAMELEN is introduced. There are some
> places in the code base that still use MAXPGPATH (1024 bytes) for WAL
> file names which is an unnecessary wastage of stack memory. This makes
> code consistent across and saves a bit of space.

Looks reasonable, too.  I don't find other instances of the same mistake.

> 3. 0003 replaces WAL file name calculation with XLogFileNameById() in
> pg_upgrade/controldata.c to be consistent across the code base. Note
> that this requires us to change the nextxlogfile size from hard-coded
> 25 bytes to MAXFNAMELEN (64 bytes).

I'm not sure I like this. In other places where XLogFileNameById() is
used, the buffer is known to store longer strings so MAXFNAMELEN is
reasonable.  But we don't need to add useless 39 bytes here.
Therefore, even if I wanted to change it, I would replace it with
"XLOG_FNAME_LEN + 1".

> I'm attaching the v2 patch set.
> 
> Thoughts?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Tue, Oct 4, 2022 at 2:01 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> > 1. 0001 replaces explicit WAL file parsing code with
>
> Looks good to me.
>
> > 2. 0002 replaces MAXPGPATH with MAXFNAMELEN for WAL file names.
>
> Looks reasonable, too.  I don't find other instances of the same mistake.

Thanks for reviewing.

> > 3. 0003 replaces WAL file name calculation with XLogFileNameById() in
> > pg_upgrade/controldata.c to be consistent across the code base. Note
> > that this requires us to change the nextxlogfile size from hard-coded
> > 25 bytes to MAXFNAMELEN (64 bytes).
>
> I'm not sure I like this. In other places where XLogFileNameById() is
> used, the buffer is known to store longer strings so MAXFNAMELEN is
> reasonable.  But we don't need to add useless 39 bytes here.
> Therefore, even if I wanted to change it, I would replace it with
> "XLOG_FNAME_LEN + 1".

I'm fine with doing either of these things. Let's hear from others.

I've added a CF entry - https://commitfest.postgresql.org/40/3927/

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Oct 04, 2022 at 06:24:18PM +0530, Bharath Rupireddy wrote:
> I'm fine with doing either of these things. Let's hear from others.
>
> I've added a CF entry - https://commitfest.postgresql.org/40/3927/

About 0002, I am not sure that it is worth bothering.  Sure, this
wastes a few bytes, but I recall that there are quite a few places in
the code where we imply a WAL segment but append a full path to it,
and this creates a few bumps with back-patches.

--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -10,6 +10,7 @@
 #include <sys/stat.h>
 #include <sys/time.h>

+#include "access/xlog_internal.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"
Well, xlog_internal.h includes a few backend-only definitions.  I'd
rather have us untangle more the backend/frontend dependencies before
adding more includes of this type, even if I agree that nextxlogfile
would be better with the implied name size limit.

Saying that, 0001 is a nice catch, so applied it.  I have switched the
two TLI variables to use TimeLineID, while touching the area.
--
Michael

Attachment