Re: backup manifests - Mailing list pgsql-hackers

From David Steele
Subject Re: backup manifests
Date
Msg-id 16538d02-fd4b-6c4f-81a5-132c8fe8c3e9@pgmasters.net
Whole thread Raw
In response to Re: backup manifests  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: backup manifests  (Robert Haas <robertmhaas@gmail.com>)
Re: backup manifests  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 3/26/20 11:37 AM, Robert Haas wrote:
>> On Wed, Mar 25, 2020 at 4:54 PM Stephen Frost <sfrost@snowman.net> wrot >
> This is where I feel like I'm trying to make decisions in a vacuum. If
> we had a few more people weighing in on the thread on this point, I'd
> be happy to go with whatever the consensus was. If most people think
> having both --no-manifest (suppressing the manifest completely) and
> --manifest-checksums=none (suppressing only the checksums) is useless
> and confusing, then sure, let's rip the latter one out. If most people
> like the flexibility, let's keep it: it's already implemented and
> tested. But I hate to base the decision on what one or two people
> think.

I'm not sure I see a lot of value to being able to build manifest with 
no checksums, especially if overhead for the default checksum algorithm 
is negligible.

However, I'd still prefer that the default be something more robust and 
allow users to tune it down rather than the other way around.  But I've 
made that pretty clear up-thread and I consider that argument lost at 
this point.

>> As for folks who are that close to the edge on their backup timing that
>> they can't have it slow down- chances are pretty darn good that they're
>> not far from ending up needing to find a better solution than
>> pg_basebackup anyway.  Or they don't need to generate a manifest (or, I
>> suppose, they could have one but not have checksums..).
> 
> 40-50% is a lot more than "if you were on the edge."

For the record I think this is a very misleading number.  Sure, if you 
are doing your backup to a local SSD on a powerful development laptop it 
makes sense.

But backups are generally placed on slower storage, remotely, with 
compression.  Even without compression the first two are going to bring 
this percentage down by a lot.

When you get to page-level incremental backups, which is where this all 
started, I'd still recommend using a stronger checksum algorithm to 
verify that the file was reconstructed correctly on restore.  That much 
I believe we have agreed on.

>> Even pg_basebackup (in both fetch and stream modes...) checks that we at
>> least got all the WAL that's needed for the backup from the server
>> before considering the backup to be valid and telling the user that
>> there was a successful backup.  With what you're proposing here, we
>> could have someone do a pg_basebackup, get back an ERROR saying the
>> backup wasn't valid, and then run pg_validatebackup and be told that the
>> backup is valid.  I don't get how that's sensible.
> 
> I'm sorry that you can't see how that's sensible, but it doesn't mean
> that it isn't sensible. It is totally unrealistic to expect that any
> backup verification tool can verify that you won't get an error when
> trying to use the backup. That would require that everything that the
> validation tool try to do everything that PostgreSQL will try to do
> when the backup is used, including running recovery and updating the
> data files. Anything less than that creates a real possibility that
> the backup will verify good but fail when used. This tool has a much
> narrower purpose, which is to try to verify that we (still) have the
> files the server sent as part of the backup and that, to the best of
> our ability to detect such things, they have not been modified. As you
> know, or should know, the WAL files are not sent as part of the
> backup, and so are not verified. Other things that would also be
> useful to check are also not verified. It would be fantastic to have
> more verification tools in the future, but it is difficult to see why
> anyone would bother trying if an attempt to get the first one
> committed gets blocked because it does not yet do everything. Very few
> patches try to do everything, and those that do usually get blocked
> because, by trying to do too much, they get some of it badly wrong.

I agree with Stephen that this should be done, but I agree with you that 
it can wait for a future commit. However, I do think:

1) It should be called out rather plainly in the documentation.
2) If there are files in pg_wal then pg_validatebackup should inform the 
user that those files have not been validated.

I know you and Stephen have agreed on a number of doc changes, would it 
be possible to get a new patch with those included? I finally have time 
to do a review of this tomorrow.  I saw some mistakes in the docs in the 
current patch but I know those patches are not current.

Regards,
-- 
-David
david@pgmasters.net



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: backup manifests
Next
From: Mark Dilger
Date:
Subject: Re: backup manifests