Re: backup manifests - Mailing list pgsql-hackers

From David Steele
Subject Re: backup manifests
Date
Msg-id 557f7af7-af3e-5fb8-1c7a-9fdd5c488ebc@pgmasters.net
Whole thread Raw
In response to Re: backup manifests  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 3/27/20 3:29 PM, Robert Haas wrote:
> On Fri, Mar 27, 2020 at 11:26 AM Stephen Frost <sfrost@snowman.net> wrote:
>>> Seems better to (later?) add support for generating manifests for WAL
>>> files, and then have a tool that can verify all the manifests required
>>> to restore a base backup.
>>
>> I'm not trying to expand on the feature set here or move the goalposts
>> way down the road, which is what seems to be what's being suggested
>> here.  To be clear, I don't have any objection to adding a generic tool
>> for validating WAL as you're talking about here, but I also don't think
>> that's required for pg_validatebackup.  What I do think we need is a
>> check of the WAL that's fetched when people use pg_basebackup -Xstream
>> or -Xfetch.  pg_basebackup itself has that check because it's critical
>> to the backup being successful and valid.  Not having that basic
>> validation of a backup really just isn't ok- there's a reason
>> pg_basebackup has that check.
> 
> I don't understand how this could be done without significantly
> complicating the architecture. As I said before, -Xstream sends WAL
> over a separate connection that is unrelated to the one running
> BASE_BACKUP, so the base-backup connection doesn't know what to
> include in the manifest. Now you could do something like: once all of
> the WAL files have been fetched, the client checksums all of those and
> sends their names and checksums to the server, which turns around and
> puts them into the manifest, which it then sends back to the client.
> But that is actually quite a bit of additional complexity, and it's
> pretty strange, too, because now you have the client checksumming some
> files and the server checksumming others. I know you mentioned a few
> different ideas before, but I think they all kinda have some problem
> along these lines.
> 
> I also kinda disagree with the idea that the WAL should be considered
> an integral part of the backup. I don't know how pgbackrest does
> things, 

We checksum each WAL file while it is read and transmitted to the repo 
by the archive_command.  Then at the end of the backup we ensure that 
all the WAL required to make the backup consistent has made it to the repo.

> but BART stores each backup in a separate directly without any
> associated WAL, and then keeps all the WAL together in a different
> directory. I imagine that people who are using continuous archiving
> also tend to use -Xnone, or if they do backups by copying the files
> rather than using pg_backrest, they exclude pg_wal. In fact, for
> people with big, important databases, I'd assume that would be the
> normal pattern. You presumably wouldn't want to keep one copy of the
> WAL files taken during the backup with the backup itself, and a
> separate copy in the archive.

pgBackRest does provide the option to copy WAL into the backup directory 
for the super-paranoid, though it is not the default. It is pretty handy 
for moving individual backups some other medium like tape, though.

If -Xnone is specified then it seems like pg_validatebackup is 
completely off the hook.  But in the case of -Xstream or -Xfetch 
couldn't we at least verify that the expected WAL segments are present 
and the correct size?

Storing the start/stop lsn in the manifest would be a nice thing to have 
anyway and that would make this feature pretty trivial. Yeah, that's in 
the backup_label file as well but the manifest is so much easier to read.

Regards,
-- 
-David
david@pgmasters.net



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace onthe fly
Next
From: Pavel Stehule
Date:
Subject: Re: proposal \gcsv