Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master - Mailing list pgsql-bugs

From Fujii Masao
Subject Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master
Date
Msg-id CAHGQGwFFn_xmvP5bXpVYU363a=wG2GRt5o25VQ5AbiHqnPJrdw@mail.gmail.com
Whole thread Raw
In response to Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master
List pgsql-bugs
On Mon, Jun 1, 2015 at 5:19 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, May 28, 2015 at 7:07 PM,  <feikesteenbergen@gmail.com> wrote:
>> The following bug has been logged on the website:
>>
>> Bug reference:      13368
>> Logged by:          Feike Steenbergen
>> Email address:      feikesteenbergen@gmail.com
>> PostgreSQL version: 9.4.2
>> Operating system:   Debian 8.0 x86_64
>> Description:
>>
>> We sometimes see a standby server promoting itself to master immediately.
>>
>> Analysis shows us that the master still has a promote file in the PGDATA
>> directory. We assume the presence of the promote file (which is copied
>> by pg_basebackup) is triggering the promotion.
>
> If there is a promote file in PGDATA when a standby starts up,
> promotion will be triggered.
>
>> The master itself previously was a standby server. The promotion was done
>> using pg_ctl promote. Analysis of our logs show that we sent pg_ctl promote
>> twice to this cluster, this also is reflected in the server log,
>> "server promoting" shows up twice.
>
> In this case promotion is triggered by CheckForStandbyTrigger(), where
> the promote file is unlinked.
>
>> Some testing shows us that in some cases, when pg_ctl promote is called
>> multiple
>> times, a promote file is left in the PGDATA directory, even though the
>> cluster
>> has been succesfully promoted and is accepting read/write queries.
>
> This is not surprising, pg_ctl bases its analysis that a node needs to
> be promoted if recovery.conf exists or not, and there is an interval
> of time between which recovery.conf is removed and the promotion is
> actually triggered, so you can create a promote file even after even
> sending SIGUSR1 to the standby's postmaster
>
>> We will try to workaround this issue by ensuring we do not send multiple
>> promote request using pg_ctl to the same cluster.
>
> Well, we could for example have the server switch promote to
> promote_done in CheckForStandbyTrigger() and then unlink it when
> recovery.conf is switched to .done. Opinions are welcome on the
> matter.

Or we can just always remove the signal file at the end of recovery.
That filename switch seems unnecessary.

In addition to that change, we should make pg_basebackup skip
the signal file?

Regards,

--
Fujii Masao

pgsql-bugs by date:

Previous
From: digoal@126.com
Date:
Subject: BUG #13391: when use in/= & subquery, non exists column can elected.
Next
From: "David G. Johnston"
Date:
Subject: Re: BUG #13391: when use in/= & subquery, non exists column can elected.