Re: v12 and TimeLine switches and backups/restores - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: v12 and TimeLine switches and backups/restores |
Date | |
Msg-id | 20200701200218.GV3125@tamriel.snowman.net Whole thread Raw |
In response to | Re: v12 and TimeLine switches and backups/restores (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: v12 and TimeLine switches and backups/restores
|
List | pgsql-hackers |
Greetings, * Robert Haas (robertmhaas@gmail.com) wrote: > On Wed, Jul 1, 2020 at 12:12 AM Stephen Frost <sfrost@snowman.net> wrote: > > Among the changes made to PG's recovery in v12 was to set > > recovery_target_timeline to be 'latest' by default. That's handy when > > you're flipping back and forth between replicas and want to have > > everyone follow that game, but it's made doing some basic things like > > restoring from a backup problematic. > > > > Specifically, if you take a backup off a primary and, while that backup > > is going on, some replica is promoted and drops a .history file into the > > WAL repo, that backup is no longer able to be restored with the new > > recovery_target_timeline default. What happens is that the restore > > process will happily follow the timeline change- even though it happened > > before we reached consistency, and then it'll never find the needed > > end-of-backup WAL point that would allow us to reach consistency. > > Ouch. Should we revert that change rather than doing this? Seems like > this might create a lot of problems for people, and they might be > problems that happen rarely enough that it looks like it's working > until it doesn't. What's the fix, if you hit the error? Add > recovery_target_timeline=<the correct timeline> to > postgresql.auto.conf? I don't really think reverting the change to make following the latest timeline would end up being terribly helpful- an awful lot of systems are going to be running with that anyway for HA and such, so it seems like something we just need to deal with. As such, it seems like this is also something that would need to be back-patched, though I've not looked at how much effort that'll be (yet), since it probably makes sense to get agreement on if this approach is the best first. There's two solutions, really- first would be, as you suggest, configure PG to stay on the timeline that the backup was taken on, but I suspect that's often *not* what the user actually wants- what they really want is to restore an earlier backup (one taken before the TL switch) and then have PG follow the timeline switch when it comes across it. We're looking at having pgbackrest automatically pick the correct backup to be able to make that happen when someone requests timeline-latest (pretty handy having a repo full of backups that allow us to pick the right one based on what the user's request is). There's another option here, though I rejected it, which is that we could possibly force the restore to ignore a TL switch before reaching consistency, but if we do that then, sure, we'll finish the restore but we won't be on the TL that the user asked us to be, and we wouldn't be able to follow a primary that's on that TL, so ultimately the restore wouldn't actually be what the user wanted. There's really not an option to do what the user wanted except to find an earlier backup to restore, so that's why I'm proposing that if we hit this situation we just PANIC. > Typo: similairly. Fixed locally. Thanks! Stephen
Attachment
pgsql-hackers by date: