Thread: Strengthen pg_waldump's --save-fullpage tests
Hi, A recent commit [1] added --save-fullpage option to pg_waldump to extract full page images (FPI) from WAL records and save them into files (one file per FPI) under a specified directory. While it added tests to check the LSN from the FPI file name and the FPI file contents, it missed to further check the FPI contents like the tuples on the page. I'm attaching a patch that basically reads the FPI file (saved by pg_waldump) contents and raw page from the table file (using pageinspect extension) and compares the tuples from both of them. This test ensures that the pg_waldump outputs the correct FPI. This idea is also discussed elsewhere [2]. Thoughts? [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d497093cbecccf6df26365e06a5f8f8614b591c8 [2] https://www.postgresql.org/message-id/CALj2ACXesN9DTjgsekM8fig7CxhhxQfQP4fCiSJgcmp9wrZOvA@mail.gmail.com -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Mon, Jan 09, 2023 at 08:30:00AM +0530, Bharath Rupireddy wrote: > A recent commit [1] added --save-fullpage option to pg_waldump to > extract full page images (FPI) from WAL records and save them into > files (one file per FPI) under a specified directory. While it added > tests to check the LSN from the FPI file name and the FPI file > contents, it missed to further check the FPI contents like the tuples > on the page. I'm attaching a patch that basically reads the FPI file > (saved by pg_waldump) contents and raw page from the table file (using > pageinspect extension) and compares the tuples from both of them. This > test ensures that the pg_waldump outputs the correct FPI. This idea is > also discussed elsewhere [2]. > > Thoughts? I am not sure that it is necessary to expand this set of tests to have dependencies on heap and pageinspect (if we do so, what of index AMs) and spend more cycles on that, while we already have something in place to cross-check ReadRecPtr with what's stored in the page header written on top of the block size. -- Michael
Attachment
On Tue, Jan 10, 2023 at 6:52 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Jan 09, 2023 at 08:30:00AM +0530, Bharath Rupireddy wrote: > > A recent commit [1] added --save-fullpage option to pg_waldump to > > extract full page images (FPI) from WAL records and save them into > > files (one file per FPI) under a specified directory. While it added > > tests to check the LSN from the FPI file name and the FPI file > > contents, it missed to further check the FPI contents like the tuples > > on the page. I'm attaching a patch that basically reads the FPI file > > (saved by pg_waldump) contents and raw page from the table file (using > > pageinspect extension) and compares the tuples from both of them. This > > test ensures that the pg_waldump outputs the correct FPI. This idea is > > also discussed elsewhere [2]. > > > > Thoughts? > > I am not sure that it is necessary to expand this set of tests to have > dependencies on heap and pageinspect (if we do so, what of index AMs) > and spend more cycles on that, while we already have something in > place to cross-check ReadRecPtr with what's stored in the page header > written on top of the block size. While checking for a page LSN is enough here, there's no harm in verifying the whole FPI fetched from WAL record with that of the raw page data. Also, this test illustrates how one can make use of the fetched FPI - like reading the contents using pg_read_binary_file() (of course on can also use COPY command to load the FPI data to postgres) and using pageinspect functions to make sense of the raw data. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On 1/10/23 2:22 AM, Michael Paquier wrote: > On Mon, Jan 09, 2023 at 08:30:00AM +0530, Bharath Rupireddy wrote: >> A recent commit [1] added --save-fullpage option to pg_waldump to >> extract full page images (FPI) from WAL records and save them into >> files (one file per FPI) under a specified directory. While it added >> tests to check the LSN from the FPI file name and the FPI file >> contents, it missed to further check the FPI contents like the tuples >> on the page. I'm attaching a patch that basically reads the FPI file >> (saved by pg_waldump) contents and raw page from the table file (using >> pageinspect extension) and compares the tuples from both of them. This >> test ensures that the pg_waldump outputs the correct FPI. This idea is >> also discussed elsewhere [2]. >> >> Thoughts? > > I am not sure that it is necessary to expand this set of tests to have > dependencies on heap and pageinspect (if we do so, what of index AMs) > and spend more cycles on that, while we already have something in > place to cross-check ReadRecPtr with what's stored in the page header > written on top of the block size. I like the idea of comparing the full page (and not just the LSN) but I'm not sure that adding the pageinspect dependency is a good thing. What about extracting the block directly from the relation file and comparing it with the one extracted from the WAL? (We'd need to skip the first 8 bytes to skip the LSN though). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Tue, Jan 10, 2023 at 05:25:44PM +0100, Drouvot, Bertrand wrote: > I like the idea of comparing the full page (and not just the LSN) but > I'm not sure that adding the pageinspect dependency is a good thing. > > What about extracting the block directly from the relation file and > comparing it with the one extracted from the WAL? (We'd need to skip the > first 8 bytes to skip the LSN though). Byte-by-byte counting for the page hole? The page checksum would matter as well, FWIW, as it is not set in WAL and a FPW logged in WAL means that the page got modified. It means that it could have been flushed, updating its pd_lsn and its pd_checksum on the way. -- Michael
Attachment
On Wed, Jan 11, 2023 at 6:32 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Jan 10, 2023 at 05:25:44PM +0100, Drouvot, Bertrand wrote: > > I like the idea of comparing the full page (and not just the LSN) but > > I'm not sure that adding the pageinspect dependency is a good thing. > > > > What about extracting the block directly from the relation file and > > comparing it with the one extracted from the WAL? (We'd need to skip the > > first 8 bytes to skip the LSN though). > > Byte-by-byte counting for the page hole? The page checksum would > matter as well, FWIW, as it is not set in WAL and a FPW logged in WAL > means that the page got modified. It means that it could have been > flushed, updating its pd_lsn and its pd_checksum on the way. Right. LSN of FPI from the WAL record and page from the table won't be the same, essentially FPI LSN <= table page. Since the LSNs are different, checksums too. This is the reason we have masking functions common/bufmask.c and rm_mask functions defined for some of the resource managers while verifying FPI consistency in verifyBackupPageConsistency(). Note that pageinspect can give only unmasked/raw page data, which means, byte-by-byte comparison isn't possible with pageinspect too, hence I was comparing only the rows with tuple_data_split(). Therefore, reading bytes from the table page and comparing byte-by-byte with FPI requires us to invent new masking functions in the tests - simply a no-go IMO. As the concern here is to not establish pageinspect dependency with pg_waldump, I'm fine to withdraw this patch and be happy with what we have today. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On 1/11/23 5:17 AM, Bharath Rupireddy wrote: > On Wed, Jan 11, 2023 at 6:32 AM Michael Paquier <michael@paquier.xyz> wrote: >> >> On Tue, Jan 10, 2023 at 05:25:44PM +0100, Drouvot, Bertrand wrote: >>> I like the idea of comparing the full page (and not just the LSN) but >>> I'm not sure that adding the pageinspect dependency is a good thing. >>> >>> What about extracting the block directly from the relation file and >>> comparing it with the one extracted from the WAL? (We'd need to skip the >>> first 8 bytes to skip the LSN though). >> >> Byte-by-byte counting for the page hole? I've in mind to use diff on the whole page (minus the LSN). >> The page checksum would >> matter as well, Right, but the TAP test is done without checksum and we could also skip the checksum from the page if we really want to. > Right. LSN of FPI from the WAL record and page from the table won't be > the same, essentially FPI LSN <= table page. Right, that's why I proposed to exclude it for the comparison. What about something like the attached? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Wed, Jan 11, 2023 at 3:28 PM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On 1/11/23 5:17 AM, Bharath Rupireddy wrote: > > On Wed, Jan 11, 2023 at 6:32 AM Michael Paquier <michael@paquier.xyz> wrote: > >> > >> On Tue, Jan 10, 2023 at 05:25:44PM +0100, Drouvot, Bertrand wrote: > >>> I like the idea of comparing the full page (and not just the LSN) but > >>> I'm not sure that adding the pageinspect dependency is a good thing. > >>> > >>> What about extracting the block directly from the relation file and > >>> comparing it with the one extracted from the WAL? (We'd need to skip the > >>> first 8 bytes to skip the LSN though). > >> > >> Byte-by-byte counting for the page hole? > > I've in mind to use diff on the whole page (minus the LSN). > > >> The page checksum would > >> matter as well, > > Right, but the TAP test is done without checksum and we could also > skip the checksum from the page if we really want to. > > > Right. LSN of FPI from the WAL record and page from the table won't be > > the same, essentially FPI LSN <= table page. > > Right, that's why I proposed to exclude it for the comparison. > > What about something like the attached? Note that the raw page on the table might differ not just in page LSN but also in other fields, for instance see heap_mask for instance. It masks lsn, checksum, hint bits, unused space etc. before verifying FPI consistency during recovery in verifyBackupPageConsistency(). I think the job of verifying FPI from WAL record with the page LSN is better left to the core - via verifyBackupPageConsistency(). Honestly, pg_waldump is good with what it has currently - LSN checks. +# Extract the binary data without the LSN from the relation's block +sysseek($frel, 8, 0); #bypass the LSN +sysread($frel, $blk, 8184) or die "sysread failed: $!"; +syswrite($blkfrel, $blk) or die "syswrite failed: $!"; I suspect that these tests are portable with the hardcoded values such as above. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Jan 11, 2023 at 07:17:47PM +0530, Bharath Rupireddy wrote: > Note that the raw page on the table might differ not just in page LSN > but also in other fields, for instance see heap_mask for instance. It > masks lsn, checksum, hint bits, unused space etc. before verifying FPI > consistency during recovery in > verifyBackupPageConsistency(). FWIW, I don't really want to enter in this business here. That feels like a good addition of technical debt compared to the potential gain. -- Michael
Attachment
On Thu, Jan 12, 2023 at 10:14 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Jan 11, 2023 at 07:17:47PM +0530, Bharath Rupireddy wrote: > > Note that the raw page on the table might differ not just in page LSN > > but also in other fields, for instance see heap_mask for instance. It > > masks lsn, checksum, hint bits, unused space etc. before verifying FPI > > consistency during recovery in > > verifyBackupPageConsistency(). > > FWIW, I don't really want to enter in this business here. That feels > like a good addition of technical debt compared to the potential > gain. I couldn't agree more. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On 1/12/23 5:44 AM, Michael Paquier wrote: > On Wed, Jan 11, 2023 at 07:17:47PM +0530, Bharath Rupireddy wrote: >> Note that the raw page on the table might differ not just in page LSN >> but also in other fields, for instance see heap_mask for instance. It >> masks lsn, checksum, hint bits, unused space etc. before verifying FPI >> consistency during recovery in >> verifyBackupPageConsistency(). > > FWIW, I don't really want to enter in this business here. That feels > like a good addition of technical debt compared to the potential > gain. Agree, let's forget about it. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com