Thread: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions

binary version of pg_current_wal_insert_lsn and pg_walfile_name functions

From
Ashutosh Sharma
Date:
Hi All,

Currently, we have pg_current_wal_insert_lsn and pg_walfile_name sql
functions which gives us information about the next wal insert
location and the WAL file that the next wal insert location belongs
to. Can we have a binary version of these sql functions? It would be
like any other binaries we have for e.g. pg_waldump to which we can
pass the location of the pg_wal directory. This binary would scan
through the directory to return the next wal insert location and the
wal file the next wal insert pointer belongs to.

The binary version of these sql functions can be used when the server
is offline. This can help us to know the overall WAL data that needs
to be replayed when the server is in recovery. In the control file we
do have the redo pointer. Knowing the end pointer would definitely be
helpful.

If you are ok then I will prepare a patch for it and share it. Please
let me know your thoughts/comments. thank you.!

--
With Regards,
Ashutosh Sharma.



Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions

From
Bharath Rupireddy
Date:
On Mon, Sep 19, 2022 at 8:19 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Hi All,
>
> Currently, we have pg_current_wal_insert_lsn and pg_walfile_name sql
> functions which gives us information about the next wal insert
> location and the WAL file that the next wal insert location belongs
> to. Can we have a binary version of these sql functions?

+1 for the idea in general.

As said, pg_waldump seems to be the right candidate. I think we want
the lsn of the last WAL record and its info and the WAL file name
given an input data directory or just the pg_wal directory or any
directory where WAL files are located. For instance, one can use this
on an archive location containing archived WAL files or on a node
where pg_receivewal is receiving WAL files. Am I missing any other
use-cases?

pg_waldump currently can't understand compressed and partial files. I
think that we need to fix this as well.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions

From
Ashutosh Sharma
Date:
On Tue, Sep 20, 2022 at 5:13 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Sep 19, 2022 at 8:19 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> >
> > Hi All,
> >
> > Currently, we have pg_current_wal_insert_lsn and pg_walfile_name sql
> > functions which gives us information about the next wal insert
> > location and the WAL file that the next wal insert location belongs
> > to. Can we have a binary version of these sql functions?
>
> +1 for the idea in general.
>
> As said, pg_waldump seems to be the right candidate. I think we want
> the lsn of the last WAL record and its info and the WAL file name
> given an input data directory or just the pg_wal directory or any
> directory where WAL files are located. For instance, one can use this
> on an archive location containing archived WAL files or on a node
> where pg_receivewal is receiving WAL files. Am I missing any other
> use-cases?
>

Yeah, we can either add this functionality to pg_waldump or maybe add
a new binary itself that would return this information.

--
With Regards,
Ashutosh Sharma.



Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions

From
Bharath Rupireddy
Date:
On Wed, Sep 21, 2022 at 9:53 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Yeah, we can either add this functionality to pg_waldump or maybe add
> a new binary itself that would return this information.

IMV, a separate tool isn't the way, since pg_waldump already reads WAL
files and decodes WAL records, what's proposed here can be an
additional functionality of pg_waldump.

It will be great if an initial patch is posted here.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions

From
Ashutosh Sharma
Date:
On Thu, Sep 22, 2022 at 7:41 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Sep 21, 2022 at 9:53 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> >
> > Yeah, we can either add this functionality to pg_waldump or maybe add
> > a new binary itself that would return this information.
>
> IMV, a separate tool isn't the way, since pg_waldump already reads WAL
> files and decodes WAL records, what's proposed here can be an
> additional functionality of pg_waldump.
>
> It will be great if an initial patch is posted here.
>

PFA that enhances pg_waldump to show the latest LSN and the
corresponding WAL file when the -l or --lastLSN option is passed an
argument to pg_waldump. Below is an example:

ashu@92893de650ed:~/pgsql$ pg_waldump -l -D ./data-dir
Latest LSN: 0/148A45F8
Latest WAL filename: 000000010000000000000014

How has it been coded?

When the user passes the '-l' command line option along with the data
directory path to pg_waldump, it reads the control file from the data
directory. From the control file, it gets information like redo
pointer and current timeline id. The redo pointer is considered to be
the start pointer from where the pg_waldump starts reading wal data
until end-of-wal to find the last LSN. For details please check the
attached patch.

Please note that for compressed and .partial wal files this doesn't work.

--
With Regards,
Ashutosh Sharma.

Attachment

Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions

From
Bharath Rupireddy
Date:
On Thu, Sep 22, 2022 at 10:25 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> PFA that enhances pg_waldump to show the latest LSN and the
> corresponding WAL file when the -l or --lastLSN option is passed an
> argument to pg_waldump. Below is an example:

Thanks for the patch. I have some quick thoughts about it.

> When the user passes the '-l' command line option along with the data
> directory path to pg_waldump, it reads the control file from the data
> directory.

I don't think we need a new option for data directory -D. pg_waldump's
option 'p' can be used, please see the comments around
identify_target_directory().

> From the control file, it gets information like redo
> pointer and current timeline id.

Is there any reason for not using get_control_file() from
src/common/controldata_utils.c, but defining the exact same function
in pg_waldump.c?

> The redo pointer is considered to be
> the start pointer from where the pg_waldump starts reading wal data
> until end-of-wal to find the last LSN. For details please check the
> attached patch.

Making it dependent on the controlfile limits the usability of this
feature. Imagine, using this feature on an archive location or
pg_receivewal target directory where there are WAL files but no
controlfile. I think we can choose the appropriate combinations of
existing pg_waldump options, for instance, let users enter the start
WAL segment via startseg and/or start LSN via --start and the new
option for end WAL segment and end LSN.

> Please note that for compressed and .partial wal files this doesn't work.

Looking forward to the above capability because it expands the
usability of this feature.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions

From
Ashutosh Sharma
Date:
On Fri, Sep 23, 2022 at 6:05 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Sep 22, 2022 at 10:25 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> >
> > PFA that enhances pg_waldump to show the latest LSN and the
> > corresponding WAL file when the -l or --lastLSN option is passed an
> > argument to pg_waldump. Below is an example:
>
> Thanks for the patch. I have some quick thoughts about it.
>
> > When the user passes the '-l' command line option along with the data
> > directory path to pg_waldump, it reads the control file from the data
> > directory.
>
> I don't think we need a new option for data directory -D. pg_waldump's
> option 'p' can be used, please see the comments around
> identify_target_directory().
>

-p is the path to the WAL directory. It doesn't necessarily have to be
a data directory, however the user can specify the data directory path
here as well using which the path to the WAL directory can be
recognized, but as I said it doesn't mean -p will always represent the
data directory.

> > From the control file, it gets information like redo
> > pointer and current timeline id.
>
> Is there any reason for not using get_control_file() from
> src/common/controldata_utils.c, but defining the exact same function
> in pg_waldump.c?
>

Will give it a thought on it later. If possible, will try to reuse it.

> > The redo pointer is considered to be
> > the start pointer from where the pg_waldump starts reading wal data
> > until end-of-wal to find the last LSN. For details please check the
> > attached patch.
>
> Making it dependent on the controlfile limits the usability of this
> feature. Imagine, using this feature on an archive location or
> pg_receivewal target directory where there are WAL files but no
> controlfile. I think we can choose the appropriate combinations of
> existing pg_waldump options, for instance, let users enter the start
> WAL segment via startseg and/or start LSN via --start and the new
> option for end WAL segment and end LSN.
>

I have written this patch assuming that the end user is not aware of
any LSN or any other WAL data and wants to know the last LSN. So all
he can do is take the help of the control data to find the redo LSN
and use that as a reference point (start pointer) to find the last
LSN. And whatever is the WAL directory (be it archive location or wall
collected via pg_receivewal or pg_wal directory), we will consider the
redo pointer as the start pointer. Now, it's possible that the WAL
corresponding to the start pointer is not at all available in the WAL
directory like archive location or pg_receivewal directory in which
this cannot be used, but this is very unlikely.

> > Please note that for compressed and .partial wal files this doesn't work.
>
> Looking forward to the above capability because it expands the
> usability of this feature.
>

This is a different task altogether. We will probably need to work on
it separately.

--
With Regards,
Ashutosh Sharma.



Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions

From
Ashutosh Sharma
Date:
PFA v2 patch.

Changes in the v2 patch:

- Reuse the existing get_controlfile function in
src/common/controldata_utils.c instead of adding a new one.

- Set env variable PGDATA with the data directory specified by the user.

--
With Regards,
Ashutosh Sharma.



Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions

From
Ashutosh Sharma
Date:
On Fri, Sep 23, 2022 at 12:24 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> PFA v2 patch.
>
> Changes in the v2 patch:
>
> - Reuse the existing get_controlfile function in
> src/common/controldata_utils.c instead of adding a new one.
>
> - Set env variable PGDATA with the data directory specified by the user.
>

Forgot to attach the patch with above changes. Here it is.

--
With Regards,
Ashutosh Sharma.

Attachment