Thread: Strengthen pg_waldump's --save-fullpage tests

Strengthen pg_waldump's --save-fullpage tests

From
Bharath Rupireddy
Date:
Hi,

A recent commit [1] added --save-fullpage option to pg_waldump to
extract full page images (FPI) from WAL records and save them into
files (one file per FPI) under a specified directory. While it added
tests to check the LSN from the FPI file name and the FPI file
contents, it missed to further check the FPI contents like the tuples
on the page. I'm attaching a patch that basically reads the FPI file
(saved by pg_waldump) contents and raw page from the table file (using
pageinspect extension) and compares the tuples from both of them. This
test ensures that the pg_waldump outputs the correct FPI. This idea is
also discussed elsewhere [2].

Thoughts?

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d497093cbecccf6df26365e06a5f8f8614b591c8
[2] https://www.postgresql.org/message-id/CALj2ACXesN9DTjgsekM8fig7CxhhxQfQP4fCiSJgcmp9wrZOvA@mail.gmail.com

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Strengthen pg_waldump's --save-fullpage tests

From
Michael Paquier
Date:
On Mon, Jan 09, 2023 at 08:30:00AM +0530, Bharath Rupireddy wrote:
> A recent commit [1] added --save-fullpage option to pg_waldump to
> extract full page images (FPI) from WAL records and save them into
> files (one file per FPI) under a specified directory. While it added
> tests to check the LSN from the FPI file name and the FPI file
> contents, it missed to further check the FPI contents like the tuples
> on the page. I'm attaching a patch that basically reads the FPI file
> (saved by pg_waldump) contents and raw page from the table file (using
> pageinspect extension) and compares the tuples from both of them. This
> test ensures that the pg_waldump outputs the correct FPI. This idea is
> also discussed elsewhere [2].
>
> Thoughts?

I am not sure that it is necessary to expand this set of tests to have
dependencies on heap and pageinspect (if we do so, what of index AMs)
and spend more cycles on that, while we already have something in
place to cross-check ReadRecPtr with what's stored in the page header
written on top of the block size.
--
Michael

Attachment

Re: Strengthen pg_waldump's --save-fullpage tests

From
Bharath Rupireddy
Date:
On Tue, Jan 10, 2023 at 6:52 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Jan 09, 2023 at 08:30:00AM +0530, Bharath Rupireddy wrote:
> > A recent commit [1] added --save-fullpage option to pg_waldump to
> > extract full page images (FPI) from WAL records and save them into
> > files (one file per FPI) under a specified directory. While it added
> > tests to check the LSN from the FPI file name and the FPI file
> > contents, it missed to further check the FPI contents like the tuples
> > on the page. I'm attaching a patch that basically reads the FPI file
> > (saved by pg_waldump) contents and raw page from the table file (using
> > pageinspect extension) and compares the tuples from both of them. This
> > test ensures that the pg_waldump outputs the correct FPI. This idea is
> > also discussed elsewhere [2].
> >
> > Thoughts?
>
> I am not sure that it is necessary to expand this set of tests to have
> dependencies on heap and pageinspect (if we do so, what of index AMs)
> and spend more cycles on that, while we already have something in
> place to cross-check ReadRecPtr with what's stored in the page header
> written on top of the block size.

While checking for a page LSN is enough here, there's no harm in
verifying the whole FPI fetched from WAL record with that of the raw
page data. Also, this test illustrates how one can make use of the
fetched FPI - like reading the contents using pg_read_binary_file()
(of course on can also use COPY command to load the FPI data to
postgres) and using pageinspect functions to make sense of the raw
data.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Strengthen pg_waldump's --save-fullpage tests

From
"Drouvot, Bertrand"
Date:
Hi,

On 1/10/23 2:22 AM, Michael Paquier wrote:
> On Mon, Jan 09, 2023 at 08:30:00AM +0530, Bharath Rupireddy wrote:
>> A recent commit [1] added --save-fullpage option to pg_waldump to
>> extract full page images (FPI) from WAL records and save them into
>> files (one file per FPI) under a specified directory. While it added
>> tests to check the LSN from the FPI file name and the FPI file
>> contents, it missed to further check the FPI contents like the tuples
>> on the page. I'm attaching a patch that basically reads the FPI file
>> (saved by pg_waldump) contents and raw page from the table file (using
>> pageinspect extension) and compares the tuples from both of them. This
>> test ensures that the pg_waldump outputs the correct FPI. This idea is
>> also discussed elsewhere [2].
>>
>> Thoughts?
> 
> I am not sure that it is necessary to expand this set of tests to have
> dependencies on heap and pageinspect (if we do so, what of index AMs)
> and spend more cycles on that, while we already have something in
> place to cross-check ReadRecPtr with what's stored in the page header
> written on top of the block size.

I like the idea of comparing the full page (and not just the LSN) but
I'm not sure that adding the pageinspect dependency is a good thing.

What about extracting the block directly from the relation file and
comparing it with the one extracted from the WAL? (We'd need to skip the
first 8 bytes to skip the LSN though).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Strengthen pg_waldump's --save-fullpage tests

From
Michael Paquier
Date:
On Tue, Jan 10, 2023 at 05:25:44PM +0100, Drouvot, Bertrand wrote:
> I like the idea of comparing the full page (and not just the LSN) but
> I'm not sure that adding the pageinspect dependency is a good thing.
>
> What about extracting the block directly from the relation file and
> comparing it with the one extracted from the WAL? (We'd need to skip the
> first 8 bytes to skip the LSN though).

Byte-by-byte counting for the page hole?  The page checksum would
matter as well, FWIW, as it is not set in WAL and a FPW logged in WAL
means that the page got modified.  It means that it could have been
flushed, updating its pd_lsn and its pd_checksum on the way.
--
Michael

Attachment

Re: Strengthen pg_waldump's --save-fullpage tests

From
Bharath Rupireddy
Date:
On Wed, Jan 11, 2023 at 6:32 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Jan 10, 2023 at 05:25:44PM +0100, Drouvot, Bertrand wrote:
> > I like the idea of comparing the full page (and not just the LSN) but
> > I'm not sure that adding the pageinspect dependency is a good thing.
> >
> > What about extracting the block directly from the relation file and
> > comparing it with the one extracted from the WAL? (We'd need to skip the
> > first 8 bytes to skip the LSN though).
>
> Byte-by-byte counting for the page hole?  The page checksum would
> matter as well, FWIW, as it is not set in WAL and a FPW logged in WAL
> means that the page got modified.  It means that it could have been
> flushed, updating its pd_lsn and its pd_checksum on the way.

Right. LSN of FPI from the WAL record and page from the table won't be
the same, essentially FPI LSN <= table page. Since the LSNs are
different, checksums too. This is the reason we have masking functions
common/bufmask.c and rm_mask functions defined for some of the
resource managers while verifying FPI consistency in
verifyBackupPageConsistency(). Note that pageinspect can give only
unmasked/raw page data, which means, byte-by-byte comparison isn't
possible with pageinspect too, hence I was comparing only the rows
with tuple_data_split().

Therefore, reading bytes from the table page and comparing
byte-by-byte with FPI requires us to invent new masking functions in
the tests - simply a no-go IMO.

As the concern here is to not establish pageinspect dependency with
pg_waldump, I'm fine to withdraw this patch and be happy with what we
have today.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Strengthen pg_waldump's --save-fullpage tests

From
"Drouvot, Bertrand"
Date:
Hi,

On 1/11/23 5:17 AM, Bharath Rupireddy wrote:
> On Wed, Jan 11, 2023 at 6:32 AM Michael Paquier <michael@paquier.xyz> wrote:
>>
>> On Tue, Jan 10, 2023 at 05:25:44PM +0100, Drouvot, Bertrand wrote:
>>> I like the idea of comparing the full page (and not just the LSN) but
>>> I'm not sure that adding the pageinspect dependency is a good thing.
>>>
>>> What about extracting the block directly from the relation file and
>>> comparing it with the one extracted from the WAL? (We'd need to skip the
>>> first 8 bytes to skip the LSN though).
>>
>> Byte-by-byte counting for the page hole?  

I've in mind to use diff on the whole page (minus the LSN).

>> The page checksum would
>> matter as well,

Right, but the TAP test is done without checksum and we could also
skip the checksum from the page if we really want to.

> Right. LSN of FPI from the WAL record and page from the table won't be
> the same, essentially FPI LSN <= table page. 

Right, that's why I proposed to exclude it for the comparison.

What about something like the attached?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachment

Re: Strengthen pg_waldump's --save-fullpage tests

From
Bharath Rupireddy
Date:
On Wed, Jan 11, 2023 at 3:28 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On 1/11/23 5:17 AM, Bharath Rupireddy wrote:
> > On Wed, Jan 11, 2023 at 6:32 AM Michael Paquier <michael@paquier.xyz> wrote:
> >>
> >> On Tue, Jan 10, 2023 at 05:25:44PM +0100, Drouvot, Bertrand wrote:
> >>> I like the idea of comparing the full page (and not just the LSN) but
> >>> I'm not sure that adding the pageinspect dependency is a good thing.
> >>>
> >>> What about extracting the block directly from the relation file and
> >>> comparing it with the one extracted from the WAL? (We'd need to skip the
> >>> first 8 bytes to skip the LSN though).
> >>
> >> Byte-by-byte counting for the page hole?
>
> I've in mind to use diff on the whole page (minus the LSN).
>
> >> The page checksum would
> >> matter as well,
>
> Right, but the TAP test is done without checksum and we could also
> skip the checksum from the page if we really want to.
>
> > Right. LSN of FPI from the WAL record and page from the table won't be
> > the same, essentially FPI LSN <= table page.
>
> Right, that's why I proposed to exclude it for the comparison.
>
> What about something like the attached?

Note that the raw page on the table might differ not just in page LSN
but also in other fields, for instance see heap_mask for instance. It
masks lsn, checksum, hint bits, unused space etc. before verifying FPI
consistency during recovery in
verifyBackupPageConsistency().

I think the job of verifying FPI from WAL record with the page LSN is
better left to the core - via verifyBackupPageConsistency(). Honestly,
pg_waldump is good with what it has currently - LSN checks.

+# Extract the binary data without the LSN from the relation's block
+sysseek($frel, 8, 0); #bypass the LSN
+sysread($frel, $blk, 8184) or die "sysread failed: $!";
+syswrite($blkfrel, $blk) or die "syswrite failed: $!";

I suspect that these tests are portable with the hardcoded values such as above.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Strengthen pg_waldump's --save-fullpage tests

From
Michael Paquier
Date:
On Wed, Jan 11, 2023 at 07:17:47PM +0530, Bharath Rupireddy wrote:
> Note that the raw page on the table might differ not just in page LSN
> but also in other fields, for instance see heap_mask for instance. It
> masks lsn, checksum, hint bits, unused space etc. before verifying FPI
> consistency during recovery in
> verifyBackupPageConsistency().

FWIW, I don't really want to enter in this business here.  That feels
like a good addition of technical debt compared to the potential
gain.
--
Michael

Attachment

Re: Strengthen pg_waldump's --save-fullpage tests

From
Bharath Rupireddy
Date:
On Thu, Jan 12, 2023 at 10:14 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jan 11, 2023 at 07:17:47PM +0530, Bharath Rupireddy wrote:
> > Note that the raw page on the table might differ not just in page LSN
> > but also in other fields, for instance see heap_mask for instance. It
> > masks lsn, checksum, hint bits, unused space etc. before verifying FPI
> > consistency during recovery in
> > verifyBackupPageConsistency().
>
> FWIW, I don't really want to enter in this business here.  That feels
> like a good addition of technical debt compared to the potential
> gain.

I couldn't agree more.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Strengthen pg_waldump's --save-fullpage tests

From
"Drouvot, Bertrand"
Date:
Hi,

On 1/12/23 5:44 AM, Michael Paquier wrote:
> On Wed, Jan 11, 2023 at 07:17:47PM +0530, Bharath Rupireddy wrote:
>> Note that the raw page on the table might differ not just in page LSN
>> but also in other fields, for instance see heap_mask for instance. It
>> masks lsn, checksum, hint bits, unused space etc. before verifying FPI
>> consistency during recovery in
>> verifyBackupPageConsistency().
> 
> FWIW, I don't really want to enter in this business here.  That feels
> like a good addition of technical debt compared to the potential
> gain.

Agree, let's forget about it.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com