Re: trying again to get incremental backup - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: trying again to get incremental backup |
Date | |
Msg-id | CA+TgmobFDWjEBCDYFJQjxiB=QEeMhGVagk7naeAVjLkvs2EhFA@mail.gmail.com Whole thread Raw |
In response to | Re: trying again to get incremental backup (Jakub Wartak <jakub.wartak@enterprisedb.com>) |
Responses |
Re: trying again to get incremental backup
|
List | pgsql-hackers |
On Thu, Sep 28, 2023 at 6:22 AM Jakub Wartak <jakub.wartak@enterprisedb.com> wrote: > If that is still an area open for discussion: wouldn't it be better to > just specify LSN as it would allow resyncing standby across major lag > where the WAL to replay would be enormous? Given that we had > primary->standby where standby would be stuck on some LSN, right now > it would be: > 1) calculate backup manifest of desynced 10TB standby (how? using > which tool?) - even if possible, that means reading 10TB of data > instead of just putting a number, isn't it? > 2) backup primary with such incremental backup >= LSN > 3) copy the incremental backup to standby > 4) apply it to the impaired standby > 5) restart the WAL replay As you may be able to tell from the flurry of posts and new patch sets, I'm trying hard to sort out the remaining open items that pertain to this patch set, and I'm now back to thinking about this one. TL;DR: I think the idea has some potential, but there are some pitfalls that I'm not sure how to address. I spent some time looking at how we currently use the data from the backup manifest. Currently, we do two things with it. First, when we're backing up each file, we check whether it's present in the backup manifest and, if not, we back it up in full. This actually feels fairly poor. If it makes any difference at all, then presumably the underlying algorithm is buggy and needs to be fixed. Maybe that should be ripped out altogether or turned into some kind of sanity check that causes a big explosion if it fails. Second, we check whether the WAL ranges reported by the client match up with the timeline history of the server (see PrepareForIncrementalBackup). This set of sanity checks seems fairly important to me, and I'd regret discarding them. I think there's some possibility that they might catch user error, like where somebody promotes multiple standbys and maybe they even get the same timeline on more than one of them, and then confusion might ensue. I also think that there's a real possibility that they might make it easier to track down bugs in my code, even if those bugs aren't necessarily timeline-related. If (or more realistically when) somebody ends up with a corrupted cluster after running pg_combinebackup, we're going to need to figure out whether that corruption is the result of bugs (and if so where they are) or user error (and if so what it was). The most obvious ways of ending up with a corrupted cluster are (1) taking an incremental backup against a prior backup that is not in the history of the server from which the backup is taken or (2) combining an incremental backup the wrong prior backup, so whatever sanity checks we can have that will tend to prevent those kinds of mistakes seem like a really good idea. And those kinds of checks seem relevant here, too. Consider that it wouldn't be valid to use pg_combinebackup to fast-forward a standby server if the incremental backup's backup-end-LSN preceded the standby server's minimum recovery point. Imagine that you have a standby whose last checkpoint's redo location was at LSN 2/48. Being the enterprising DBA that you are, you make a note of that LSN and go take an incremental backup based on it. You then stop the standby server and try to apply the incremental backup to fast-forward the standby. Well, it's possible that in the meanwhile the standby actually caught up, and now has a minimum recovery point that follows the backup-end-LSN of your incremental backup. In that case, you can't legally use that incremental backup to fast-forward that standby, but no code I've yet written would be smart enough to figure that out. Or, maybe you (or some other DBA on your team) got really excited and actually promoted that standby meanwhile, and now it's not even on the same timeline any more. In the "normal" case where you take an incremental backup based on an earlier base backup, these kinds of problems are detectable, and it seems to me that if we want to enable this kind of use case, it would be pretty smart to have a plan to detect similar mistakes here. I don't, currently, but maybe there is one. Another practical problem here is that, right now, pg_combinebackup doesn't have an in-place mode. It knows how to take a bunch of input backups and write out an output backup, but that output backup needs to go into a new, fresh directory (or directories plural, if there are user-defined tablespaces). I had previously considered adding such a mode, but the idea I had at the time wouldn't have worked for this case. I imagined that someone might want to run "pg_combinebackup --in-place full incr" and clobber the contents of the incr directory with the output, basically discarding the incremental backup you took in favor of a full backup that could have been taken at the same point in time. But here, you'd want to clobber the *first* input to pg_combinebackup, not the last one, so if we want to add something like this, the UI needs some thought. One thing that I find quite scary about such a mode is that if you crash mid-way through, you're in a lot of trouble. In the case that I had previous contemplated -- overwrite the last incremental with the reconstructed full backup -- you *might* be able to make it crash safe by writing out the full files for each incremental file, fsyncing everything, then removing all of the incremental files and fsyncing again. The idea would be that if you crash midway through it's OK to just repeat whatever you were trying to do before the crash and if it succeeds the second time then all is well. If, for a given file, there are both incremental and non-incremental versions, then the second attempt should remove and recreate the non-incremental version from the incremental version. If there's only a non-incremental version, it could be that the previous attempt got far enough to remove the incremental file, but in that case the full file that we now have should be the same thing that we would produce if we did the operation now. It all sounds a little scary, but maybe it's OK. And as long as you don't remove the this-is-an-incremental-backup markers from the backup_label file until you've done everything else, you can tell whether you've ever successfully completed the reassembly or not. But if you're using a hypothetical overwrite mode to overwrite the first input rather than the last one, well, it looks like a valid data directory already, and if you replace a bunch of files and then crash, it still does, but it's not any more, really. I'm not sure I've really wrapped my head around all of the cases here, but it does feel like there are some new ways to go wrong. One thing I also realized when thinking about this is that you could probably hose yourself with the patch set as it stands today by taking a full backup, downgrading to wal_level=minimal for a while, doing some WAL-skipping operations, upgrading to a higher WAL-level again, and then taking an incremental backup. I think the solution to that is probably for the WAL summarizer to refuse to run if wal_level=minimal. Then there would be a gap in the summary files which an incremental backup attempt would detect. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: