Re: Promoting a standby during base backup (was Re: Switching timeline over streaming replication) - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Promoting a standby during base backup (was Re: Switching timeline over streaming replication) |
Date | |
Msg-id | CA+U5nMKE6cS9tNz6GOTX5x6w263tYAJ6BmPzW=BnXu0srt=DqA@mail.gmail.com Whole thread Raw |
In response to | Re: Promoting a standby during base backup (was Re: Switching timeline over streaming replication) (Fujii Masao <masao.fujii@gmail.com>) |
List | pgsql-hackers |
On 4 October 2012 18:07, Fujii Masao <masao.fujii@gmail.com> wrote: > On Thu, Oct 4, 2012 at 4:59 PM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> On 03.10.2012 18:15, Amit Kapila wrote: >>> >>> On Tuesday, October 02, 2012 4:21 PM Heikki Linnakangas wrote: >>>> >>>> Hmm, should a base backup be aborted when the standby is promoted? Does >>>> the promotion render the backup corrupt? >>> >>> >>> I think currently it does so. Pls refer >>> 1. >>> do_pg_stop_backup(char *labelfile, bool waitforarchive) >>> { >>> .. >>> if (strcmp(backupfrom, "standby") == 0&& !backup_started_in_recovery) >>> ereport(ERROR, >>> >>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), >>> errmsg("the standby was promoted during >>> online backup"), >>> errhint("This means that the backup >>> being >>> taken is corrupt " >>> "and should not be used. >>> " >>> "Try taking another >>> online >>> backup."))); >>> .. >>> >>> } >> >> >> Okay. I think that check in do_pg_stop_backup() actually already ensures >> that you don't end up with a corrupt backup, even if the standby is promoted >> while a backup is being taken. Admittedly it would be nicer to abort it >> immediately rather than error out at the end. >> >> But I wonder why promoting a standby renders the backup invalid in the first >> place? Fujii, Simon, can you explain that? > > Simon had the same question and I answered it before. > > http://archives.postgresql.org/message-id/CAHGQGwFU04oO8YL5SUcdjVq3BRNi7WtfzTy9wA2kXtZNHicTeA@mail.gmail.com > --------------------------------------- >> You say >> "If the standby is promoted to the master during online backup, the >> backup fails." >> but no explanation of why? >> >> I could work those things out, but I don't want to have to, plus we >> may disagree if I did. > > If the backup succeeds in that case, when we start an archive recovery from that > backup, the recovery needs to cross between two timelines. Which means that > we need to set recovery_target_timeline before starting recovery. Whether > recovery_target_timeline needs to be set or not depends on whether the standby > was promoted during taking the backup. Leaving such a decision to a user seems > fragile. I accepted your answer before, but I think it should be challenged now. This is definitely a time when you really want that backup, so invalidating it for such a weak reason is not useful, even if I understand your original thought. Something that has concerned me is that we don't have an explicit "timeline change record". We *say* we do that at shutdown checkpoints, but that is recorded in the new timeline. So we have the strange situation of changing timeline at two separate places. When we change timeline we really should generate one last WAL on the old timeline that marks an explicit change of timeline and a single exact moment when the timeline change takes place. With PITR we are unable to do that, because any timeline can fork at any point. With smooth switchover we have a special case that is not "anything goes" and there is a good case for not incrementing the timeline at all. This is still a half-formed thought, but at least you should know I'm in the debate. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: