Re: v12 and TimeLine switches and backups/restores - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: v12 and TimeLine switches and backups/restores
Date
Msg-id 20200701201927.GW3125@tamriel.snowman.net
Whole thread Raw
In response to Re: v12 and TimeLine switches and backups/restores  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Wed, Jul 1, 2020 at 4:02 PM Stephen Frost <sfrost@snowman.net> wrote:
> > There's two solutions, really- first would be, as you suggest, configure
> > PG to stay on the timeline that the backup was taken on, but I suspect
> > that's often *not* what the user actually wants- what they really want
> > is to restore an earlier backup (one taken before the TL switch) and
> > then have PG follow the timeline switch when it comes across it.
>
> It seems, though, that if it IS what the user actually wants, they're
> now going to get the wrong behavior by default, and that seems pretty
> undesirable.

Well, even if we revert the change to the default of target_timeline, it
seems like we should still add the check that I'm proposing, to address
the case where someone explicitly asks for the latest timeline.

> > There's another option here, though I rejected it, which is that we
> > could possibly force the restore to ignore a TL switch before reaching
> > consistency, but if we do that then, sure, we'll finish the restore but
> > we won't be on the TL that the user asked us to be, and we wouldn't be
> > able to follow a primary that's on that TL, so ultimately the restore
> > wouldn't actually be what the user wanted.  There's really not an option
> > to do what the user wanted except to find an earlier backup to restore,
> > so that's why I'm proposing that if we hit this situation we just PANIC.
>
> I'm not sure I really believe this. If someone tries to configure a
> backup without inserting a non-default setting of
> recovery_target_timeline, is it more likely that they want backup
> restoration to fail, or that they want to recover from the timeline
> that will let backup restoration succeed? You're arguing for the
> former, but my instinct was the latter. Perhaps we need to hear some
> other opinions.

Ultimately depends on if the user is knowledgable regarding what the
default is, or not.  I'm going off the expectation that they know what
the default value is and the other argument is that they have no idea
what the default is and just expect the restore to work- which isn't a
wrong position to take, but the entire situation is only going to
happen if there's been a promotion involving a replica in the first
place, and that newly-promoted-replica pushed a .history file into the
same WAL repo that this server is following the WAL from, and if you're
running with replicas and you promote them, you probably do want to be
using a target timeline of 'latest' or your replicas won't follow those
timeline switches.

Changing the default now in a back-patch would actively break such
setups that are working now in a very non-obvious way too, only to be
discovered when a replica is promoted and another replica stops keeping
up because it keeps on its current timeline.

In the above situation, the restore will fail either way from what I've
seen- if we hit end-of-WAL before reaching consistency then we'll PANIC,
or if we come across a SHUTDOWN record, we'll also PANIC, so it's not
like the user is going to get a successful restore that's just
corrupted, thankfully.  Catching this earlier with a clearer error
message, as I'm proposing here, seems like it would generally be helpful
though (perhaps with an added HINT: use an earlier backup to restore
from...).

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_read_file() with virtual files returns empty string
Next
From: Magnus Hagander
Date:
Subject: Re: Remove Deprecated Exclusive Backup Mode