Re: Promoting a standby during base backup (was Re: Switching timeline over streaming replication) - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Promoting a standby during base backup (was Re: Switching timeline over streaming replication)
Date
Msg-id CA+U5nMKE6cS9tNz6GOTX5x6w263tYAJ6BmPzW=BnXu0srt=DqA@mail.gmail.com
Whole thread Raw
In response to Re: Promoting a standby during base backup (was Re: Switching timeline over streaming replication)  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
On 4 October 2012 18:07, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Oct 4, 2012 at 4:59 PM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> On 03.10.2012 18:15, Amit Kapila wrote:
>>>
>>> On Tuesday, October 02, 2012 4:21 PM Heikki Linnakangas wrote:
>>>>
>>>> Hmm, should a base backup be aborted when the standby is promoted? Does
>>>> the promotion render the backup corrupt?
>>>
>>>
>>> I think currently it does so. Pls refer
>>> 1.
>>> do_pg_stop_backup(char *labelfile, bool waitforarchive)
>>> {
>>> ..
>>> if (strcmp(backupfrom, "standby") == 0&&  !backup_started_in_recovery)
>>>                  ereport(ERROR,
>>>
>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>                                   errmsg("the standby was promoted during
>>> online backup"),
>>>                                   errhint("This means that the backup
>>> being
>>> taken is corrupt "
>>>                                                   "and should not be used.
>>> "
>>>                                                   "Try taking another
>>> online
>>> backup.")));
>>> ..
>>>
>>> }
>>
>>
>> Okay. I think that check in do_pg_stop_backup() actually already ensures
>> that you don't end up with a corrupt backup, even if the standby is promoted
>> while a backup is being taken. Admittedly it would be nicer to abort it
>> immediately rather than error out at the end.
>>
>> But I wonder why promoting a standby renders the backup invalid in the first
>> place? Fujii, Simon, can you explain that?
>
> Simon had the same question and I answered it before.
>
> http://archives.postgresql.org/message-id/CAHGQGwFU04oO8YL5SUcdjVq3BRNi7WtfzTy9wA2kXtZNHicTeA@mail.gmail.com
> ---------------------------------------
>> You say
>> "If the standby is promoted to the master during online backup, the
>> backup fails."
>> but no explanation of why?
>>
>> I could work those things out, but I don't want to have to, plus we
>> may disagree if I did.
>
> If the backup succeeds in that case, when we start an archive recovery from that
> backup, the recovery needs to cross between two timelines. Which means that
> we need to set recovery_target_timeline before starting recovery. Whether
> recovery_target_timeline needs to be set or not depends on whether the standby
> was promoted during taking the backup. Leaving such a decision to a user seems
> fragile.

I accepted your answer before, but I think it should be challenged
now. This is definitely a time when you really want that backup, so
invalidating it for such a weak reason is not useful, even if I
understand your original thought.

Something that has concerned me is that we don't have an explicit
"timeline change record". We *say* we do that at shutdown checkpoints,
but that is recorded in the new timeline. So we have the strange
situation of changing timeline at two separate places.

When we change timeline we really should generate one last WAL on the
old timeline that marks an explicit change of timeline and a single
exact moment when the timeline change takes place. With PITR we are
unable to do that, because any timeline can fork at any point. With
smooth switchover we have a special case that is not "anything goes"
and there is a good case for not incrementing the timeline at all.

This is still a half-formed thought, but at least you should know I'm
in the debate.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Visual Studio 2012 RC
Next
From: Peter Geoghegan
Date:
Subject: Re: sortsupport for text