Re: [PATCH] Verify Checksums during Basebackups - Mailing list pgsql-hackers

From David Steele
Subject Re: [PATCH] Verify Checksums during Basebackups
Date
Msg-id 66ee3722-54bb-6c63-46b5-7acb326b876d@pgmasters.net
Whole thread Raw
In response to Re: [PATCH] Verify Checksums during Basebackups  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
Hi Michael,

On 3/5/18 6:36 AM, Stephen Frost wrote:
> * Michael Banck (michael.banck@credativ.de) wrote:
>
>> So I guess this would have to be sent back via the replication protocol,
>> but I don't see an off-hand way to do this easily?
>
> The final ordinary result set could be extended to include the
> information about checksum failures..?  I'm a bit concerned about what
> to do when there are a lot of checksum failures though..  Ideally, you'd
> identify all of the pages in all of the files where a checksum failed
> (just throwing an error such as the one proposed above is really rather
> terrible since you have no idea what block, or even what table, failed
> the checksum...).

I agree that knowing the name of the file that failed validation is
really important, with a list of the pages that failed validation being
a nice thing to have as well, though I would be fine having the latter
added in a future version.

For instance, in pgBackRest we output validation failures this way:

[from a regression test]
WARN: invalid page checksums found in file
[TEST_PATH]/db-primary/db/base/base/32768/33001 at pages 0, 3-5, 7

Note that we collate ranges of errors to keep the output from being too
overwhelming.

I think the file names are very important because there's a rather large
chance that corruption may happen in an index, unlogged table, or
something else that can be rebuilt or reloaded.  Knowing where the
corruption is can save a lot of headaches.

> Reviewing the original patch and considering this issue, I believe there
> may be a larger problem- while very unlikely, there's been concern that
> it's possible to read a half-written page (and possibly only the second
> half) and end up with a checksum failure due to that.  In pgBackRest, we
> address that by doing another read of the page and by checking the LSN
> vs. where we started the backup (if the LSN is more recent than when the
> backup started then we don't have to care about the page- it'll be in
> the WAL).

The need to reread pages can be drastically reduced by skipping
validation of any page that has an LSN >= the backup start LSN because
they will be replayed from WAL during recovery.

The rereads are still necessary because of the possible transposition of
page read vs. page write as Stephen notes above.  We have not been able
to reproduce this case but can't discount it.

Regards,
--
-David
david@pgmasters.net


Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Partition-wise aggregation/grouping
Next
From: Peter Eisentraut
Date:
Subject: Re: JIT compiling with LLVM v11