On Mon, Jun 1, 2015 at 5:19 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, May 28, 2015 at 7:07 PM, <feikesteenbergen@gmail.com> wrote:
>> The following bug has been logged on the website:
>>
>> Bug reference: 13368
>> Logged by: Feike Steenbergen
>> Email address: feikesteenbergen@gmail.com
>> PostgreSQL version: 9.4.2
>> Operating system: Debian 8.0 x86_64
>> Description:
>>
>> We sometimes see a standby server promoting itself to master immediately.
>>
>> Analysis shows us that the master still has a promote file in the PGDATA
>> directory. We assume the presence of the promote file (which is copied
>> by pg_basebackup) is triggering the promotion.
>
> If there is a promote file in PGDATA when a standby starts up,
> promotion will be triggered.
>
>> The master itself previously was a standby server. The promotion was done
>> using pg_ctl promote. Analysis of our logs show that we sent pg_ctl promote
>> twice to this cluster, this also is reflected in the server log,
>> "server promoting" shows up twice.
>
> In this case promotion is triggered by CheckForStandbyTrigger(), where
> the promote file is unlinked.
>
>> Some testing shows us that in some cases, when pg_ctl promote is called
>> multiple
>> times, a promote file is left in the PGDATA directory, even though the
>> cluster
>> has been succesfully promoted and is accepting read/write queries.
>
> This is not surprising, pg_ctl bases its analysis that a node needs to
> be promoted if recovery.conf exists or not, and there is an interval
> of time between which recovery.conf is removed and the promotion is
> actually triggered, so you can create a promote file even after even
> sending SIGUSR1 to the standby's postmaster
>
>> We will try to workaround this issue by ensuring we do not send multiple
>> promote request using pg_ctl to the same cluster.
>
> Well, we could for example have the server switch promote to
> promote_done in CheckForStandbyTrigger() and then unlink it when
> recovery.conf is switched to .done. Opinions are welcome on the
> matter.
Or we can just always remove the signal file at the end of recovery.
That filename switch seems unnecessary.
In addition to that change, we should make pg_basebackup skip
the signal file?
Regards,
--
Fujii Masao