v12 and TimeLine switches and backups/restores - Mailing list pgsql-hackers

From Stephen Frost
Subject v12 and TimeLine switches and backups/restores
Date
Msg-id 20200701041214.GM3125@tamriel.snowman.net
Whole thread Raw
Responses Re: v12 and TimeLine switches and backups/restores  (Robert Haas <robertmhaas@gmail.com>)
Re: v12 and TimeLine switches and backups/restores  (Michael Banck <michael.banck@credativ.de>)
List pgsql-hackers
Greetings,

Among the changes made to PG's recovery in v12 was to set
recovery_target_timeline to be 'latest' by default.  That's handy when
you're flipping back and forth between replicas and want to have
everyone follow that game, but it's made doing some basic things like
restoring from a backup problematic.

Specifically, if you take a backup off a primary and, while that backup
is going on, some replica is promoted and drops a .history file into the
WAL repo, that backup is no longer able to be restored with the new
recovery_target_timeline default.  What happens is that the restore
process will happily follow the timeline change- even though it happened
before we reached consistency, and then it'll never find the needed
end-of-backup WAL point that would allow us to reach consistency.

Naturally, a primary isn't ever going to do a TL switch, and we already
throw an error during an online backup from a replica if that replica
did a TL switch during the backup, to indicate that the backup isn't
valid.

Attached is an initial draft of a patch to at least give a somewhat
clearer error message when we detect that the user has asked us to
follow a timeline switch to a new timeline before we've reached
consistency (though I had to hack in a check to see if pg_rewind is
being used, since apparently it actually depends on PG following a
timeline switch before reaching consistency...).

Thoughts?

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Intermittent BRIN failures on hyrax and lousyjack
Next
From: Tatsuo Ishii
Date:
Subject: Re: Transactions involving multiple postgres foreign servers, take 2