Re: pg_combinebackup fails on file named INCREMENTAL.* - Mailing list pgsql-hackers

From Robert Haas
Subject Re: pg_combinebackup fails on file named INCREMENTAL.*
Date
Msg-id CA+TgmoYS6+j9R-Ah5MG=XVCxvYoB+kjY8VYUH6E5q8_UnDzv8g@mail.gmail.com
Whole thread Raw
In response to Re: pg_combinebackup fails on file named INCREMENTAL.*  (Stefan Fercot <stefan.fercot@protonmail.com>)
Responses Re: pg_combinebackup fails on file named INCREMENTAL.*
List pgsql-hackers
On Tue, Apr 16, 2024 at 3:10 AM Stefan Fercot
<stefan.fercot@protonmail.com> wrote:
> > > But ... I didn't really end up feeling very comfortable with it. Right
> > > now, the backup manifest is something we only use to verify the
> > > integrity of the backup. If we were to do this, it would become a
> > > critical part of the backup.
>
> Isn't it already the case? I mean, you need the manifest of the previous backup to take an incremental one, right?
> And shouldn't we encourage to verify the backup sets before (at least) trying to combine them?
> It's not because a file was only use for one specific purpose until now that we can't improve it later.
> Splitting the meaningful information across multiple places would be more error-prone (for both devs and users) imo.

Well, right now, if you just take a full backup, and you throw away
the backup manifest because you don't care, you have a working full
backup. Furthermore, if you took any incremental backups based on that
full backup before discarding the manifest, you can still restore
them. Now, it is possible that nobody in the world cares about those
properties other than me; I have been known to (a) care about weird
stuff and (b) be a pedant. However, we've already shipped a couple of
releases where backup manifests were thoroughly non-critical: you
needed them to run pg_verifybackup, and for nothing else. I think it's
quite likely that there are users out there who are used to things
working in that way, and I'm not sure that those users will adjust
their expectations when a new feature comes out. I also feel that if I
were a user, I would think of something called a "manifest" as just a
table of contents for whatever the thing was. I still remember
downloading tar files from the Internet in the 1990s and there'd be a
file in the tarball sometimes called MANIFEST which was, you know, a
list of what was in the tarball. You didn't need that file for
anything functional; it was just so you could check if anything was
missing.

What I fear is that this will turn into another situation like we had
with pg_xlog, where people saw "log" in the name and just blew it
away. Matter of fact, I recently encountered one of my few recent
examples of someone doing that thing since the pg_wal renaming
happened. Some users don't take much convincing to remove anything
that looks inessential. And what I'm particularly worried about with
this feature is tar-format backups. If you have a directory format
backup and you do an "ls", you're going to see a whole bunch of files
in there of which backup_manifest will be one. How you treat that file
is just going to depend on what you know about its purpose. But if you
have a tar-format backup, possibly compressed, the backup_manifest
file stands out a lot more. You may have something like this:

backup_manifest root.tar.gz 16384.tar.gz

Well, at this point, it becomes much more likely that you're going to
think that there are special rules for the backup_manifest file.

The kicker for me is that I can't see any reason to do any of this
stuff. Including the information that we need to elide incremental
stubs in some other way, say with one stub-list per directory, will be
easier to implement and probably perform better. Like, I'm not saying
we can't find a way to jam this into the manifest. But I'm fairly sure
it's just making life difficult for ourselves.

I may ultimately lose this argument, as I did the one about whether
the backup_manifest should be JSON or some bespoke format. And that's
fine. I respect your opinion, and David's. But I also reserve the
right to feel differently, and I do. And I would also just gently
point out that my level of motivation to work on a particular feature
can depend quite a bit on whether I'm going to be forced to implement
it in a way that I disagree with.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: [PATCH] Replace magic constant 3 with NUM_MERGE_MATCH_KINDS
Next
From: Robert Haas
Date:
Subject: Re: pg_combinebackup fails on file named INCREMENTAL.*