Re: Add a new pg_walinspect function to extract FPIs from WAL records - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Add a new pg_walinspect function to extract FPIs from WAL records
Date
Msg-id CALj2ACXesN9DTjgsekM8fig7CxhhxQfQP4fCiSJgcmp9wrZOvA@mail.gmail.com
Whole thread Raw
In response to Re: Add a new pg_walinspect function to extract FPIs from WAL records  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Add a new pg_walinspect function to extract FPIs from WAL records
List pgsql-hackers
On Fri, Jan 6, 2023 at 11:47 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Jan 5, 2023 at 6:51 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > > I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that
both
> > > extracted images are the same and matches the one modified right after the checkpoint.
> > >
> > > What do you think? (could be done later in another patch though).
> >
> > I think pageinspect can be used here. We can fetch the raw page from
> > the table after the checkpoint and raw FPI from the WAL record logged
> > as part of the update. I've tried to do so [1], but I see a slight
> > difference in the raw output. The expectation is that they both be the
> > same. It might be that the update operation logs the FPI with some
> > more info set (prune_xid). I'll try to see why it is so.
> >
> > I'm attaching the v2 patch for further review.
> >
> > [1]
> > SELECT * FROM page_header(:'page_from_table');
> >     lsn    | checksum | flags | lower | upper | special | pagesize |
> > version | prune_xid
> > -----------+----------+-------+-------+-------+---------+----------+---------+-----------
> >  0/1891D78 |        0 |     0 |    40 |  8064 |    8192 |     8192 |
> >     4 |         0
> > (1 row)
> >
> > SELECT * FROM page_header(:'page_from_wal');
> >     lsn    | checksum | flags | lower | upper | special | pagesize |
> > version | prune_xid
> > -----------+----------+-------+-------+-------+---------+----------+---------+-----------
> >  0/1891D78 |        0 |     0 |    44 |  8032 |    8192 |     8192 |
> >     4 |       735
> > (1 row)
>
> Ugh, v2 patch missed the new file added, I'm attaching v3 patch for
> further review. Sorry for the noise.

I took a stab at how and what gets logged as FPI in WAL records:

Option 1:
WAL record with FPI contains both the unmodified table page from the
disk after checkpoint and new tuple (not applied to the unmodified
page) and the recovery (redo) applies the new tuple to the unmodified
page as part of recovery. A bit more WAL is needed to store both
unmodified page and new tuple data in the WAL record and recovery can
get slower a bit too as it needs to stitch the modified page.

Option 2:
WAL record with FPI contains only the modified page (new tuple applied
to the unmodified page from the disk after checkpoint) and the
recovery (redo)  just returns the applied block as BLK_RESTORED.
Recovery can get faster with this approach and less WAL is needed to
store just the modified page.

My earlier understanding was that postgres does option (1), however, I
was wrong, option (2) is what actually postgres has implemented for
the obvious advantages specified.

I now made the tests a bit stricter in checking the FPI contents
(tuple values) pulled from the WAL record with raw page contents
pulled from the table using the pageinspect extension. Please see the
attached v4 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: ATTACH PARTITION seems to ignore column generation status
Next
From: Tom Lane
Date:
Subject: Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)