Re: pg_combinebackup does not detect missing files - Mailing list pgsql-hackers

From David Steele
Subject Re: pg_combinebackup does not detect missing files
Date
Msg-id 8892a443-9158-443c-bbad-6047b6153b47@pgmasters.net
Whole thread Raw
In response to Re: pg_combinebackup does not detect missing files  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On 5/18/24 21:06, Tomas Vondra wrote:
> 
> On 5/17/24 14:20, Robert Haas wrote:
>> On Fri, May 17, 2024 at 1:18 AM David Steele <david@pgmasters.net> wrote:
>>> However, I think allowing the user to optionally validate the input
>>> would be a good feature. Running pg_verifybackup as a separate step is
>>> going to be a more expensive then verifying/copying at the same time.
>>> Even with storage tricks to copy ranges of data, pg_combinebackup is
>>> going to aware of files that do not need to be verified for the current
>>> operation, e.g. old copies of free space maps.
>>
>> In cases where pg_combinebackup reuses a checksums from the input
>> manifest rather than recomputing it, this could accomplish something.
>> However, for any file that's actually reconstructed, pg_combinebackup
>> computes the checksum as it's writing the output file. I don't see how
>> it's sensible to then turn around and verify that the checksum that we
>> just computed is the same one that we now get. It makes sense to run
>> pg_verifybackup on the output of pg_combinebackup at a later time,
>> because that can catch bits that have been flipped on disk in the
>> meanwhile. But running the equivalent of pg_verifybackup during
>> pg_combinebackup would amount to doing the exact same checksum
>> calculation twice and checking that it gets the same answer both
>> times.
>>
>>> One more thing occurs to me -- if data checksums are enabled then a
>>> rough and ready output verification would be to test the checksums
>>> during combine. Data checksums aren't very good but something should be
>>> triggered if a bunch of pages go wrong, especially since the block
>>> offset is part of the checksum. This would be helpful for catching
>>> combine bugs.
>>
>> I don't know, I'm not very enthused about this. I bet pg_combinebackup
>> has some bugs, and it's possible that one of them could involve
>> putting blocks in the wrong places, but it doesn't seem especially
>> likely. Even if it happens, it's more likely to be that
>> pg_combinebackup thinks it's putting them in the right places but is
>> actually writing them to the wrong offset in the file, in which case a
>> block-checksum calculation inside pg_combinebackup is going to think
>> everything's fine, but a standalone tool that isn't confused will be
>> able to spot the damage.
> 
> Perhaps more importantly, can you even verify data checksums before the
> recovery is completed? I don't think you can (pg_checksums certainly
> does not allow doing that). Because who knows in what shape you copied
> the block?

Yeah, you'd definitely need a list of blocks you knew to be valid at 
backup time, which sounds like a lot more work that just some overall 
checksumming scheme.

Regards,
-David



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Why is citext/regress failing on hamerkop?
Next
From: Ole Peder Brandtzæg
Date:
Subject: Re: Requiring LLVM 14+ in PostgreSQL 18