Re: pg_combinebackup fails on file named INCREMENTAL.* - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: pg_combinebackup fails on file named INCREMENTAL.* |
Date | |
Msg-id | CA+TgmoYS6+j9R-Ah5MG=XVCxvYoB+kjY8VYUH6E5q8_UnDzv8g@mail.gmail.com Whole thread Raw |
In response to | Re: pg_combinebackup fails on file named INCREMENTAL.* (Stefan Fercot <stefan.fercot@protonmail.com>) |
Responses |
Re: pg_combinebackup fails on file named INCREMENTAL.*
|
List | pgsql-hackers |
On Tue, Apr 16, 2024 at 3:10 AM Stefan Fercot <stefan.fercot@protonmail.com> wrote: > > > But ... I didn't really end up feeling very comfortable with it. Right > > > now, the backup manifest is something we only use to verify the > > > integrity of the backup. If we were to do this, it would become a > > > critical part of the backup. > > Isn't it already the case? I mean, you need the manifest of the previous backup to take an incremental one, right? > And shouldn't we encourage to verify the backup sets before (at least) trying to combine them? > It's not because a file was only use for one specific purpose until now that we can't improve it later. > Splitting the meaningful information across multiple places would be more error-prone (for both devs and users) imo. Well, right now, if you just take a full backup, and you throw away the backup manifest because you don't care, you have a working full backup. Furthermore, if you took any incremental backups based on that full backup before discarding the manifest, you can still restore them. Now, it is possible that nobody in the world cares about those properties other than me; I have been known to (a) care about weird stuff and (b) be a pedant. However, we've already shipped a couple of releases where backup manifests were thoroughly non-critical: you needed them to run pg_verifybackup, and for nothing else. I think it's quite likely that there are users out there who are used to things working in that way, and I'm not sure that those users will adjust their expectations when a new feature comes out. I also feel that if I were a user, I would think of something called a "manifest" as just a table of contents for whatever the thing was. I still remember downloading tar files from the Internet in the 1990s and there'd be a file in the tarball sometimes called MANIFEST which was, you know, a list of what was in the tarball. You didn't need that file for anything functional; it was just so you could check if anything was missing. What I fear is that this will turn into another situation like we had with pg_xlog, where people saw "log" in the name and just blew it away. Matter of fact, I recently encountered one of my few recent examples of someone doing that thing since the pg_wal renaming happened. Some users don't take much convincing to remove anything that looks inessential. And what I'm particularly worried about with this feature is tar-format backups. If you have a directory format backup and you do an "ls", you're going to see a whole bunch of files in there of which backup_manifest will be one. How you treat that file is just going to depend on what you know about its purpose. But if you have a tar-format backup, possibly compressed, the backup_manifest file stands out a lot more. You may have something like this: backup_manifest root.tar.gz 16384.tar.gz Well, at this point, it becomes much more likely that you're going to think that there are special rules for the backup_manifest file. The kicker for me is that I can't see any reason to do any of this stuff. Including the information that we need to elide incremental stubs in some other way, say with one stub-list per directory, will be easier to implement and probably perform better. Like, I'm not saying we can't find a way to jam this into the manifest. But I'm fairly sure it's just making life difficult for ourselves. I may ultimately lose this argument, as I did the one about whether the backup_manifest should be JSON or some bespoke format. And that's fine. I respect your opinion, and David's. But I also reserve the right to feel differently, and I do. And I would also just gently point out that my level of motivation to work on a particular feature can depend quite a bit on whether I'm going to be forced to implement it in a way that I disagree with. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: