Re: [patch] Fix checksum verification in base backups for zero page headers - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [patch] Fix checksum verification in base backups for zero page headers
Date
Msg-id 20201020062432.GA30362@paquier.xyz
Whole thread Raw
In response to Re: [patch] Fix checksum verification in base backups for zero page headers  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
Responses Re: [patch] Fix checksum verification in base backups for zero page headers  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
List pgsql-hackers
On Fri, Oct 16, 2020 at 06:02:52PM +0300, Anastasia Lubennikova wrote:
> In the current patch, PageIsVerifed called from pgbasebackup simply doesn't
> Should we change this too? I don't see any difference.

I considered that, but now that does not seem worth bothering here.

> Done.

Thanks for the updated patch.  I have played with dd and some fake
files to check the validity of the zero-case (as long as pd_lsn does
not go wild).  Here are some comments, with an updated patch attached:
- Added a PageIsVerifiedExtended to make this patch back-patchable,
with the original routine keeping its original shape.
- I did not like much to show the WARNING from PageIsVerified() for
the report, and we'd lose some context related to the base backups.
The important part is to know which blocks from which files are found
as problematic.
- Switched the checks to have PageIsVerified() placed first in the
hierarchy, so as we do all the basic validity checks before a look at
the LSN.  This is not really change things logically.
- The meaning of block_retry is also the opposite of what it should be
in the original code?  IMO, the flag should be set to true if we still
are fine to retry a block, and set it to false once the one-time retry
has failed.
- Moved the hardcoded failure report threshold of 5 into its own
variable.
- The error strings are not really consistent with the project style
in this area.  These are usually not spawned across multiple lines to
ease searches with grep or such.

Anastasia, Michael B, does that look OK to you?

NB: This patch fixes only one problem, the zero-page case, as it was
the intention of Michael B to split this part into its own thread.  We
still have, of course, a second problem when it comes to a random LSN
put into the page header which could mask an actual checksum failure
so this only takes care of half the issues.  Having a correct LSN
approximation is a tricky problem IMO, but we could improve things by
having some delta with an extra WAL page on top of GetInsertRecPtr().
And this function cannot be used for base backups taken from
standbys.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Track statistics for streaming of in-progress transactions
Next
From: Masahiko Sawada
Date:
Subject: Re: Transactions involving multiple postgres foreign servers, take 2