Thread: Add a new pg_walinspect function to extract FPIs from WAL records

Add a new pg_walinspect function to extract FPIs from WAL records

From
Bharath Rupireddy
Date:
Hi,

Here's a patch that implements the idea of extracting full page images
from WAL records [1] [2] with a function in pg_walinspect. This new
function accepts start and end lsn and returns full page image info
such as WAL record lsn, tablespace oid, database oid, relfile number,
block number, fork name and the raw full page (as bytea). I'll
register this in the next commitfest.

Thoughts?

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d497093cbecccf6df26365e06a5f8f8614b591c8
[2] https://postgr.es/m/CAOxo6XKjQb2bMSBRpePf3ZpzfNTwjQUc4Tafh21=jzjX6bX8CA@mail.gmail.com

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Add a new pg_walinspect function to extract FPIs from WAL records

From
"Drouvot, Bertrand"
Date:
Hi,

On 12/27/22 12:48 PM, Bharath Rupireddy wrote:
> Hi,
> 
> Here's a patch that implements the idea of extracting full page images
> from WAL records [1] [2] with a function in pg_walinspect. This new
> function accepts start and end lsn and returns full page image info
> such as WAL record lsn, tablespace oid, database oid, relfile number,
> block number, fork name and the raw full page (as bytea). I'll
> register this in the next commitfest.
> 
> Thoughts?
> 

I think it makes sense to somehow align the pg_walinspect functions with the pg_waldump "features".
And since [1] added FPI "extraction" then +1 for the proposed patch in this thread.

> [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d497093cbecccf6df26365e06a5f8f8614b591c8
> [2] https://postgr.es/m/CAOxo6XKjQb2bMSBRpePf3ZpzfNTwjQUc4Tafh21=jzjX6bX8CA@mail.gmail.com

I just have a few comments:

+
+/*
+ * Get full page images and their info associated with a given WAL record.
+ */

What about adding a few words about compression? (like "Decompression is applied if necessary"?)


+               /* Full page exists, so let's output it. */
+               if (!RestoreBlockImage(record, block_id, page))

"Full page exists, so let's output its info and content." instead?


+     <para>
+      Gets raw full page images and their information associated with all the
+      valid WAL records between <replaceable>start_lsn</replaceable> and
+      <replaceable>end_lsn</replaceable>. Returns one row per full page image.

Worth to add a few words about decompression too?


I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that both
extracted images are the same and matches the one modified right after the checkpoint.

What do you think? (could be done later in another patch though).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Add a new pg_walinspect function to extract FPIs from WAL records

From
Bharath Rupireddy
Date:
On Wed, Jan 4, 2023 at 8:19 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> I think it makes sense to somehow align the pg_walinspect functions with the pg_waldump "features".
> And since [1] added FPI "extraction" then +1 for the proposed patch in this thread.
>
> > [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d497093cbecccf6df26365e06a5f8f8614b591c8
> > [2] https://postgr.es/m/CAOxo6XKjQb2bMSBRpePf3ZpzfNTwjQUc4Tafh21=jzjX6bX8CA@mail.gmail.com
>
> I just have a few comments:

Thanks for reviewing.

> +
> +/*
> + * Get full page images and their info associated with a given WAL record.
> + */
>
>
> +     <para>
> +      Gets raw full page images and their information associated with all the
> +      valid WAL records between <replaceable>start_lsn</replaceable> and
> +      <replaceable>end_lsn</replaceable>. Returns one row per full page image.
>
> Worth to add a few words about decompression too?

Done.

> What about adding a few words about compression? (like "Decompression is applied if necessary"?)
>
>
> +               /* Full page exists, so let's output it. */
> +               if (!RestoreBlockImage(record, block_id, page))
>
> "Full page exists, so let's output its info and content." instead?

Done.

> I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that both
> extracted images are the same and matches the one modified right after the checkpoint.
>
> What do you think? (could be done later in another patch though).

I think pageinspect can be used here. We can fetch the raw page from
the table after the checkpoint and raw FPI from the WAL record logged
as part of the update. I've tried to do so [1], but I see a slight
difference in the raw output. The expectation is that they both be the
same. It might be that the update operation logs the FPI with some
more info set (prune_xid). I'll try to see why it is so.

I'm attaching the v2 patch for further review.

[1]
SELECT * FROM page_header(:'page_from_table');
    lsn    | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
 0/1891D78 |        0 |     0 |    40 |  8064 |    8192 |     8192 |
    4 |         0
(1 row)

SELECT * FROM page_header(:'page_from_wal');
    lsn    | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
 0/1891D78 |        0 |     0 |    44 |  8032 |    8192 |     8192 |
    4 |       735
(1 row)

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Add a new pg_walinspect function to extract FPIs from WAL records

From
vignesh C
Date:
On Thu, 5 Jan 2023 at 18:52, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Jan 4, 2023 at 8:19 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > I think it makes sense to somehow align the pg_walinspect functions with the pg_waldump "features".
> > And since [1] added FPI "extraction" then +1 for the proposed patch in this thread.
> >
> > > [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d497093cbecccf6df26365e06a5f8f8614b591c8
> > > [2] https://postgr.es/m/CAOxo6XKjQb2bMSBRpePf3ZpzfNTwjQUc4Tafh21=jzjX6bX8CA@mail.gmail.com
> >
> > I just have a few comments:
>
> Thanks for reviewing.
>
> > +
> > +/*
> > + * Get full page images and their info associated with a given WAL record.
> > + */
> >
> >
> > +     <para>
> > +      Gets raw full page images and their information associated with all the
> > +      valid WAL records between <replaceable>start_lsn</replaceable> and
> > +      <replaceable>end_lsn</replaceable>. Returns one row per full page image.
> >
> > Worth to add a few words about decompression too?
>
> Done.
>
> > What about adding a few words about compression? (like "Decompression is applied if necessary"?)
> >
> >
> > +               /* Full page exists, so let's output it. */
> > +               if (!RestoreBlockImage(record, block_id, page))
> >
> > "Full page exists, so let's output its info and content." instead?
>
> Done.
>
> > I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that
both
> > extracted images are the same and matches the one modified right after the checkpoint.
> >
> > What do you think? (could be done later in another patch though).
>
> I think pageinspect can be used here. We can fetch the raw page from
> the table after the checkpoint and raw FPI from the WAL record logged
> as part of the update. I've tried to do so [1], but I see a slight
> difference in the raw output. The expectation is that they both be the
> same. It might be that the update operation logs the FPI with some
> more info set (prune_xid). I'll try to see why it is so.
>
> I'm attaching the v2 patch for further review.

I felt one of the files was missing in the patch:
[13:39:03.534] contrib/pg_walinspect/meson.build:19:0: ERROR: File
pg_walinspect--1.0--1.1.sql does not exist.

Please post an updated version for the same.

Regards,
Vignesh



Re: Add a new pg_walinspect function to extract FPIs from WAL records

From
Bharath Rupireddy
Date:
On Thu, Jan 5, 2023 at 6:51 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> > I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that
both
> > extracted images are the same and matches the one modified right after the checkpoint.
> >
> > What do you think? (could be done later in another patch though).
>
> I think pageinspect can be used here. We can fetch the raw page from
> the table after the checkpoint and raw FPI from the WAL record logged
> as part of the update. I've tried to do so [1], but I see a slight
> difference in the raw output. The expectation is that they both be the
> same. It might be that the update operation logs the FPI with some
> more info set (prune_xid). I'll try to see why it is so.
>
> I'm attaching the v2 patch for further review.
>
> [1]
> SELECT * FROM page_header(:'page_from_table');
>     lsn    | checksum | flags | lower | upper | special | pagesize |
> version | prune_xid
> -----------+----------+-------+-------+-------+---------+----------+---------+-----------
>  0/1891D78 |        0 |     0 |    40 |  8064 |    8192 |     8192 |
>     4 |         0
> (1 row)
>
> SELECT * FROM page_header(:'page_from_wal');
>     lsn    | checksum | flags | lower | upper | special | pagesize |
> version | prune_xid
> -----------+----------+-------+-------+-------+---------+----------+---------+-----------
>  0/1891D78 |        0 |     0 |    44 |  8032 |    8192 |     8192 |
>     4 |       735
> (1 row)

Ugh, v2 patch missed the new file added, I'm attaching v3 patch for
further review. Sorry for the noise.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Add a new pg_walinspect function to extract FPIs from WAL records

From
Bharath Rupireddy
Date:
On Fri, Jan 6, 2023 at 11:47 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Jan 5, 2023 at 6:51 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > > I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that
both
> > > extracted images are the same and matches the one modified right after the checkpoint.
> > >
> > > What do you think? (could be done later in another patch though).
> >
> > I think pageinspect can be used here. We can fetch the raw page from
> > the table after the checkpoint and raw FPI from the WAL record logged
> > as part of the update. I've tried to do so [1], but I see a slight
> > difference in the raw output. The expectation is that they both be the
> > same. It might be that the update operation logs the FPI with some
> > more info set (prune_xid). I'll try to see why it is so.
> >
> > I'm attaching the v2 patch for further review.
> >
> > [1]
> > SELECT * FROM page_header(:'page_from_table');
> >     lsn    | checksum | flags | lower | upper | special | pagesize |
> > version | prune_xid
> > -----------+----------+-------+-------+-------+---------+----------+---------+-----------
> >  0/1891D78 |        0 |     0 |    40 |  8064 |    8192 |     8192 |
> >     4 |         0
> > (1 row)
> >
> > SELECT * FROM page_header(:'page_from_wal');
> >     lsn    | checksum | flags | lower | upper | special | pagesize |
> > version | prune_xid
> > -----------+----------+-------+-------+-------+---------+----------+---------+-----------
> >  0/1891D78 |        0 |     0 |    44 |  8032 |    8192 |     8192 |
> >     4 |       735
> > (1 row)
>
> Ugh, v2 patch missed the new file added, I'm attaching v3 patch for
> further review. Sorry for the noise.

I took a stab at how and what gets logged as FPI in WAL records:

Option 1:
WAL record with FPI contains both the unmodified table page from the
disk after checkpoint and new tuple (not applied to the unmodified
page) and the recovery (redo) applies the new tuple to the unmodified
page as part of recovery. A bit more WAL is needed to store both
unmodified page and new tuple data in the WAL record and recovery can
get slower a bit too as it needs to stitch the modified page.

Option 2:
WAL record with FPI contains only the modified page (new tuple applied
to the unmodified page from the disk after checkpoint) and the
recovery (redo)  just returns the applied block as BLK_RESTORED.
Recovery can get faster with this approach and less WAL is needed to
store just the modified page.

My earlier understanding was that postgres does option (1), however, I
was wrong, option (2) is what actually postgres has implemented for
the obvious advantages specified.

I now made the tests a bit stricter in checking the FPI contents
(tuple values) pulled from the WAL record with raw page contents
pulled from the table using the pageinspect extension. Please see the
attached v4 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Add a new pg_walinspect function to extract FPIs from WAL records

From
"Drouvot, Bertrand"
Date:
Hi,

On 1/6/23 6:41 PM, Bharath Rupireddy wrote:
> On Fri, Jan 6, 2023 at 11:47 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>
>> On Thu, Jan 5, 2023 at 6:51 PM Bharath Rupireddy
>> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>>
>>>> I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that
both
>>>> extracted images are the same and matches the one modified right after the checkpoint.
>>>>
>>>> What do you think? (could be done later in another patch though).
>>>
>>> I think pageinspect can be used here. We can fetch the raw page from
>>> the table after the checkpoint and raw FPI from the WAL record logged
>>> as part of the update. I've tried to do so [1], but I see a slight
>>> difference in the raw output. The expectation is that they both be the
>>> same. It might be that the update operation logs the FPI with some
>>> more info set (prune_xid). I'll try to see why it is so.
>>>
>>> I'm attaching the v2 patch for further review.
>>>
>>> [1]
>>> SELECT * FROM page_header(:'page_from_table');
>>>      lsn    | checksum | flags | lower | upper | special | pagesize |
>>> version | prune_xid
>>> -----------+----------+-------+-------+-------+---------+----------+---------+-----------
>>>   0/1891D78 |        0 |     0 |    40 |  8064 |    8192 |     8192 |
>>>      4 |         0
>>> (1 row)
>>>
>>> SELECT * FROM page_header(:'page_from_wal');
>>>      lsn    | checksum | flags | lower | upper | special | pagesize |
>>> version | prune_xid
>>> -----------+----------+-------+-------+-------+---------+----------+---------+-----------
>>>   0/1891D78 |        0 |     0 |    44 |  8032 |    8192 |     8192 |
>>>      4 |       735
>>> (1 row)
>>
>> Ugh, v2 patch missed the new file added, I'm attaching v3 patch for
>> further review. Sorry for the noise.
> 
> I took a stab at how and what gets logged as FPI in WAL records:
> 
> Option 1:
> WAL record with FPI contains both the unmodified table page from the
> disk after checkpoint and new tuple (not applied to the unmodified
> page) and the recovery (redo) applies the new tuple to the unmodified
> page as part of recovery. A bit more WAL is needed to store both
> unmodified page and new tuple data in the WAL record and recovery can
> get slower a bit too as it needs to stitch the modified page.
> 
> Option 2:
> WAL record with FPI contains only the modified page (new tuple applied
> to the unmodified page from the disk after checkpoint) and the
> recovery (redo)  just returns the applied block as BLK_RESTORED.
> Recovery can get faster with this approach and less WAL is needed to
> store just the modified page.
> 
> My earlier understanding was that postgres does option (1), however, I
> was wrong, option (2) is what actually postgres has implemented for
> the obvious advantages specified.
> 
> I now made the tests a bit stricter in checking the FPI contents
> (tuple values) pulled from the WAL record with raw page contents
> pulled from the table using the pageinspect extension. Please see the
> attached v4 patch.
> 

Thanks for updating the patch!

+-- Compare FPI from WAL record and page from table, they must be same

I think "must be the same" or "must be identical" sounds better (but not 100% sure).

Except this nit, V4 looks good to me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Add a new pg_walinspect function to extract FPIs from WAL records

From
Michael Paquier
Date:
On Tue, Jan 10, 2023 at 09:29:03AM +0100, Drouvot, Bertrand wrote:
> Thanks for updating the patch!
>
> +-- Compare FPI from WAL record and page from table, they must be same
>
> I think "must be the same" or "must be identical" sounds better (but not 100% sure).
>
> Except this nit, V4 looks good to me.

+postgres=# SELECT lsn, tablespace_oid, database_oid, relfile_number,
block_number, fork_name, length(fpi) > 0 as fpi_ok FROM
pg_get_wal_fpi_info('0/7418E60', '0/7518218');

This query in the docs is too long IMO.  Could you split that across
multiple lines for readability?

+      pg_get_wal_fpi_info(start_lsn pg_lsn,
+                          end_lsn pg_lsn,
+                          lsn OUT pg_lsn,
+                          tablespace_oid OUT oid,
+                          database_oid OUT oid,
+                          relfile_number OUT oid,
+                          block_number OUT int8,
+                          fork_name OUT text,
+                          fpi OUT bytea)
I am a bit surprised by this format, used to define the functions part
of the module in the docs, while we have examples that actually show
what's printed out.  I understand that this comes from the original
commit of the module, but the rendered docs are really hard to parse
as well, no?  FWIW, I think that this had better be fixed as well in
the docs of v15..  Showing a full set of attributes for the returned
record is fine by me, still if these are too long we could just use
\x.  For this one, I think that there is little point in showing 14
records, so I would stick with a style similar to pageinspect.

+CREATE FUNCTION pg_get_wal_fpi_info(IN start_lsn pg_lsn,
+    IN end_lsn pg_lsn,
+    OUT lsn pg_lsn,
+   OUT tablespace_oid oid,
Slight indentation issue here.

Using "relfile_number" would be a first, for what is defined in the
code and the docs as a filenode.

+SELECT pg_current_wal_lsn() AS wal_lsn4 \gset
+-- Get FPI from WAL record
+SELECT fpi AS page_from_wal FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
+  WHERE relfile_number = :'sample_tbl_oid' \gset
I would be tempted to keep the checks run here minimal with only a
basic set of checks on the LSN, without the dependencies to
pageinspect (tuple_data_split and get_raw_page), which would be fine
enough to check the execution of the function.

FWIW, I am surprised by the design choice behind ValidateInputLSNs()
to allow data to be gathered until the end of WAL in some cases, but
to not allow it in others.  It is likely too late to come back to this
choice for the existing functions in v15 (quoique?), but couldn't it
be useful to make this new FPI function work at least with an insanely
high LSN value to make sure that we fetch all the FPIs from a given
start position, up to the end of WAL?  That looks like a pretty good
default behavior to me, rather than issuing an error when a LSN is
defined as in the future..  I am really wondering why we have
ValidateInputLSNs(till_end_of_wal=false) to begin with, while we could
just allow any LSN value in the future automatically, as we can know
the current insert or replay LSNs (depending on the recovery state).
--
Michael

Attachment

Re: Add a new pg_walinspect function to extract FPIs from WAL records

From
Bharath Rupireddy
Date:
On Wed, Jan 11, 2023 at 10:07 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> +postgres=# SELECT lsn, tablespace_oid, database_oid, relfile_number,
> block_number, fork_name, length(fpi) > 0 as fpi_ok FROM
> pg_get_wal_fpi_info('0/7418E60', '0/7518218');
>
> This query in the docs is too long IMO.  Could you split that across
> multiple lines for readability?

Done.

> +      pg_get_wal_fpi_info(start_lsn pg_lsn,
> +                          end_lsn pg_lsn,
> +                          lsn OUT pg_lsn,
> +                          tablespace_oid OUT oid,
> +                          database_oid OUT oid,
> +                          relfile_number OUT oid,
> +                          block_number OUT int8,
> +                          fork_name OUT text,
> +                          fpi OUT bytea)
> I am a bit surprised by this format, used to define the functions part
> of the module in the docs, while we have examples that actually show
> what's printed out.  I understand that this comes from the original
> commit of the module, but the rendered docs are really hard to parse
> as well, no?  FWIW, I think that this had better be fixed as well in
> the docs of v15..  Showing a full set of attributes for the returned
> record is fine by me, still if these are too long we could just use
> \x.

Thanks. I'll work on that separately.

> For this one, I think that there is little point in showing 14
> records, so I would stick with a style similar to pageinspect.

I've done it that way for pg_get_wal_fpi_info. If this format looks
okay, I can propose to do the same for other functions (for
backpatching too) in a separate thread though.

> +CREATE FUNCTION pg_get_wal_fpi_info(IN start_lsn pg_lsn,
> +    IN end_lsn pg_lsn,
> +    OUT lsn pg_lsn,
> +   OUT tablespace_oid oid,
> Slight indentation issue here.

Done.

> Using "relfile_number" would be a first, for what is defined in the
> code and the docs as a filenode.

Yes, I've changed the column names to be consistent (like in pg_buffercache).

> +SELECT pg_current_wal_lsn() AS wal_lsn4 \gset
> +-- Get FPI from WAL record
> +SELECT fpi AS page_from_wal FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
> +  WHERE relfile_number = :'sample_tbl_oid' \gset
> I would be tempted to keep the checks run here minimal with only a
> basic set of checks on the LSN, without the dependencies to
> pageinspect (tuple_data_split and get_raw_page), which would be fine
> enough to check the execution of the function.

I understand the concern here that creating dependency between
extensions just for testing isn't good.

I'm okay to just read the LSN (lsn1) from raw FPI (bytea stream) and
the WAL record's LSN (lsn2) and compare them to be lsn2 > lsn1. I'm
looking for a way to convert the first 8 bytes from bytea stream to
pg_lsn type, on a quick look I couldn't find direct conversion
functions, however, I'll try to figure out a way.

> FWIW, I am surprised by the design choice behind ValidateInputLSNs()
> to allow data to be gathered until the end of WAL in some cases, but
> to not allow it in others.  It is likely too late to come back to this
> choice for the existing functions in v15 (quoique?), but couldn't it

Separate functions for users passing end_lsn by themselves and users
letting functions decide the end_lsn (current flush LSN or replay LSN)
were chosen for better and easier usability and easier validation of
user-entered input lsns.

We deliberated to have something like below:
pg_get_wal_stats(start_lsn, end_lsn, till_end_of_wal default false);
pg_get_wal_records_info(start_lsn, end_lsn, till_end_of_wal default false);

We wanted to have better validation of the start_lsn and end_lsn, that
is, start_lsn < end_lsn and end_lsn mustn't fall into the future when
users specify it by themselves (otherwise, one can easily trick the
server by passing in the extreme end of the LSN - 0xFFFFFFFFFFFFFFFF).
And, we couldn't find a better way to deal with when till_end_of_wal
is passed as true (in the above version of the functions).

Another idea was to have something like below:
pg_get_wal_stats(start_lsn, end_lsn default '0/0');
pg_get_wal_records_info(start_lsn, end_lsn default '0/0');

When end_lsn is not entered or entered as invalid lsn, then return the
stats/info till end of the WAL. Again, we wanted to have some
validation of the user-entered end_lsn.

Instead of cooking multiple behaviours into a single function we opted
for till_end_of_wal versions. I still feel this is better unless
there's a strong reason against till_end_of_wal versions.

> be useful to make this new FPI function work at least with an insanely
> high LSN value to make sure that we fetch all the FPIs from a given
> start position, up to the end of WAL?  That looks like a pretty good
> default behavior to me, rather than issuing an error when a LSN is
> defined as in the future..  I am really wondering why we have
> ValidateInputLSNs(till_end_of_wal=false) to begin with, while we could
> just allow any LSN value in the future automatically, as we can know
> the current insert or replay LSNs (depending on the recovery state).

Hm. How about having pg_get_wal_fpi_info_till_end_of_wal() then?

With some of the review comments addressed, I'm attaching v5 patch
herewith. I would like to hear thoughts on the above open points
before writing the v6 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Add a new pg_walinspect function to extract FPIs from WAL records

From
Michael Paquier
Date:
On Wed, Jan 11, 2023 at 06:59:18PM +0530, Bharath Rupireddy wrote:
> I've done it that way for pg_get_wal_fpi_info. If this format looks
> okay, I can propose to do the same for other functions (for
> backpatching too) in a separate thread though.

My vote would be to make that happen first, to have in place cleaner
basics for the docs.  I could just do it and move on..

> We deliberated to have something like below:
> pg_get_wal_stats(start_lsn, end_lsn, till_end_of_wal default false);
> pg_get_wal_records_info(start_lsn, end_lsn, till_end_of_wal default false);
>
> We wanted to have better validation of the start_lsn and end_lsn, that
> is, start_lsn < end_lsn and end_lsn mustn't fall into the future when
> users specify it by themselves (otherwise, one can easily trick the
> server by passing in the extreme end of the LSN - 0xFFFFFFFFFFFFFFFF).
> And, we couldn't find a better way to deal with when till_end_of_wal
> is passed as true (in the above version of the functions).
>
> Another idea was to have something like below:
> pg_get_wal_stats(start_lsn, end_lsn default '0/0');
> pg_get_wal_records_info(start_lsn, end_lsn default '0/0');
>
> When end_lsn is not entered or entered as invalid lsn, then return the
> stats/info till end of the WAL. Again, we wanted to have some
> validation of the user-entered end_lsn.

This reminds me of the slot advancing, where we discarded this case
just because it is useful to enforce a LSN far in the future.
Honestly, I cannot think of any case where I would use this level of
validation, especially having *two* functions to decide one behavior
or the other: this stuff could just use one function and use for
example NULL as a setup to enforce the end of WAL, on top of a LSN far
ahead..  But well..

>> be useful to make this new FPI function work at least with an insanely
>> high LSN value to make sure that we fetch all the FPIs from a given
>> start position, up to the end of WAL?  That looks like a pretty good
>> default behavior to me, rather than issuing an error when a LSN is
>> defined as in the future..  I am really wondering why we have
>> ValidateInputLSNs(till_end_of_wal=false) to begin with, while we could
>> just allow any LSN value in the future automatically, as we can know
>> the current insert or replay LSNs (depending on the recovery state).
>
> Hm. How about having pg_get_wal_fpi_info_till_end_of_wal() then?

I don't really want to make the interface more bloated with more
functions than necessary, TBH, though I get your point to push for
consistency in these functions.  This makes me really wonder whether
we'd better make relax all the existing functions, but I may get
outvoted.
--
Michael

Attachment

Re: Add a new pg_walinspect function to extract FPIs from WAL records

From
Bharath Rupireddy
Date:
On Thu, Jan 12, 2023 at 11:23 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jan 11, 2023 at 06:59:18PM +0530, Bharath Rupireddy wrote:
> > I've done it that way for pg_get_wal_fpi_info. If this format looks
> > okay, I can propose to do the same for other functions (for
> > backpatching too) in a separate thread though.
>
> My vote would be to make that happen first, to have in place cleaner
> basics for the docs.  I could just do it and move on..

Sure. Posted a separate patch here -
https://www.postgresql.org/message-id/CALj2ACVGcUpziGgQrcT-1G3dHWQQfWjYBu1YQ2ypv9y86dgogg%40mail.gmail.com

> This reminds me of the slot advancing, where we discarded this case
> just because it is useful to enforce a LSN far in the future.
> Honestly, I cannot think of any case where I would use this level of
> validation, especially having *two* functions to decide one behavior
> or the other: this stuff could just use one function and use for
> example NULL as a setup to enforce the end of WAL, on top of a LSN far
> ahead..  But well..

I understand. I don't mind discussing something like [1] with the
following behaviour and discarding till_end_of_wal functions
altogether:
If start_lsn is NULL, error out/return NULL.
If end_lsn isn't specified, default to NULL, then determine the end_lsn.
If end_lsn is specified as NULL, then determine the end_lsn.
If end_lsn is specified as non-NULL, then determine if it is greater
than start_lsn if yes, go ahead do the job, otherwise error out.

I'll think a bit more on this and perhaps discuss it separately.

> > Hm. How about having pg_get_wal_fpi_info_till_end_of_wal() then?
>
> I don't really want to make the interface more bloated with more
> functions than necessary, TBH, though I get your point to push for
> consistency in these functions.  This makes me really wonder whether
> we'd better make relax all the existing functions, but I may get
> outvoted.

I'll keep the FPI extract function simple as proposed in the patch and
I'll not go write till_end_of_wal version. If needed to get all the
FPIs till the end of WAL, one can always determine the end_lsn with
pg_current_wal_flush_lsn()/pg_last_wal_replay_lsn() when in recovery,
and use the proposed function.

Note that I kept the FPI extract function test simple - ensure FPI
gets generated and check if the new function can fetch it, discarding
lsn or other FPI sanity checks. Whole FPI sanity check needs
pageinspect and we don't want to create the dependency just for tests.
And checking for FPI lsn with WAL record lsn requires us to fetch lsn
from raw bytea stream. As there's no straightforward way to convert
raw bytes from bytea to pg_lsn, doing that in a platform-agnostic
manner (little-endian and big-endian comes into play here) actually is
a no-go IMO. FWIW, for a little-endian system I had to do [2].

Therefore I stick to the simple test unless there's a better way.

I'm attaching the v6 patch for further review.

[1]
CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn,
    IN end_lsn pg_lsn DEFAULT NULL,
    OUT start_lsn pg_lsn,
    OUT end_lsn pg_lsn,
    OUT prev_lsn pg_lsn,
    OUT xid xid,
    OUT resource_manager text,
    OUT record_type text,
    OUT record_length int4,
    OUT main_data_length int4,
    OUT fpi_length int4,
    OUT description text,
    OUT block_ref text
)
RETURNS SETOF record
AS 'MODULE_PATHNAME', 'pg_get_wal_records_info'
LANGUAGE C CALLED ON NULL INPUT IMMUTABLE PARALLEL SAFE STRICT PARALLEL SAFE;

CREATE FUNCTION pg_get_wal_stats(IN start_lsn pg_lsn,
    IN end_lsn pg_lsn DEFAULT NULL,
    IN  per_record boolean DEFAULT false,
    OUT "resource_manager/record_type" text,
    OUT count int8,
    OUT count_percentage float8,
    OUT record_size int8,
    OUT record_size_percentage float8,
    OUT fpi_size int8,
    OUT fpi_size_percentage float8,
    OUT combined_size int8,
    OUT combined_size_percentage float8
)
RETURNS SETOF record
AS 'MODULE_PATHNAME', 'pg_get_wal_stats'
LANGUAGE C CALLED ON NULL INPUT IMMUTABLE PARALLEL SAFE STRICT PARALLEL SAFE;

[2]

select '\x00000000b8078901000000002c00601f00200420df020000e09f3800c09f3800a09f380080'::bytea
AS fpi \gset
select substring(:'fpi' from 3 for 16) AS rlsn \gset
select substring(:'rlsn' from 7 for 2) || substring(:'rlsn' from 5 for
2) || substring(:'rlsn' from 3 for 2) || substring(:'rlsn' from 1 for
2) AS hi \gset
select substring(:'rlsn' from 15 for 2) || substring(:'rlsn' from 13
for 2) || substring(:'rlsn' from 11 for 2) || substring(:'rlsn' from 9
for 2) AS lo \gset
select (:'hi' || '/' || :'lo')::pg_lsn;

postgres=# select (:'hi' || '/' || :'lo')::pg_lsn;
  pg_lsn
-----------
 0/18907B8
(1 row)

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Add a new pg_walinspect function to extract FPIs from WAL records

From
Michael Paquier
Date:
On Thu, Jan 12, 2023 at 05:37:40PM +0530, Bharath Rupireddy wrote:
> I understand. I don't mind discussing something like [1] with the
> following behaviour and discarding till_end_of_wal functions
> altogether:
> If start_lsn is NULL, error out/return NULL.
> If end_lsn isn't specified, default to NULL, then determine the end_lsn.
> If end_lsn is specified as NULL, then determine the end_lsn.
> If end_lsn is specified as non-NULL, then determine if it is greater
> than start_lsn if yes, go ahead do the job, otherwise error out.
>
> I'll think a bit more on this and perhaps discuss it separately.

FWIW, I still find the current interface of the module bloated.  So,
while it is possible to stick some pg_current_wal_lsn() calls to
bypass the error in most cases, enforcing the end of WAL with a NULL
or larger value would still be something I would push for based on my
own experience as there would be no need to worry about the latest LSN
as being two different values in two function contexts.  You could
keep the functions as STRICT for consistency, and just allow larger
values as a synonym for the end of WAL.

Saying that, the version of pg_get_wal_fpi_info() committed respects
the current behavior of the module, with an error on an incorrect end
LSN.

> I'll keep the FPI extract function simple as proposed in the patch and
> I'll not go write till_end_of_wal version. If needed to get all the
> FPIs till the end of WAL, one can always determine the end_lsn with
> pg_current_wal_flush_lsn()/pg_last_wal_replay_lsn() when in recovery,
> and use the proposed function.

I was reading the patch this morning, and that's pretty much what I
would have done in terms of simplicity with a test checking that at
least one FPI has been generated.  I have shortened a bit the
documentation, tweaked a few comments and applied the whole after
seeing the result.

One thing that I have been wondering about is whether it is worth
adding the block_id from the record in the output, but discarded this
idea as it could be confused with the physical block number, even if
this function is for advanced users.
--
Michael

Attachment