Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master - Mailing list pgsql-bugs

From Fujii Masao
Subject Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master
Date
Msg-id CAHGQGwH0qnvHqxSF6LUXmMCLQxT8NtZHsJN7ij0sK=AW1DC3Aw@mail.gmail.com
Whole thread Raw
In response to Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-bugs
On Fri, Jul 3, 2015 at 4:03 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Jul 2, 2015 at 10:00 PM, Fujii Masao wrote:
>>> At the beginning of
>>> StartupXLOG() before parsing recovery.conf we could check for the
>>> existence of promotion trigger files and unlink them unconditionally.
>>
>> There seems to be still race condition: postmaster can receive SIGUSR1
>> before the startup process removes the promotion trigger file. Then
>> the variable promote_triggered can be set to true unexpectedly.
>
> Ah, yes. That's indeed possible.
>
>> So, what about making postmaster remove the trigger file unconditionally
>> before the startup process is forked, instead? For example, just after
>> PostmasterMain() calls RemovePgTempFiles()?
>
> Sounds fine, no other processes are running at this point.

Thanks for updating the patch!

+                * Skip promote signal files. Those files may have
been created by
+                * pg_ctl to trigger a promotion but they do not
belong to a base
+                * backup.
+                */
+               if (strcmp(de->d_name, PROMOTE_SIGNAL_FILE) == 0 ||
+                       strcmp(de->d_name, FALLBACK_PROMOTE_SIGNAL_FILE) == 0)
+                       continue;

Do we really want this? This is not required because the files are
always removed at the server startup. Also the backup performance
gain by skipping them would be almost zero since their size is zero.
That is, no need to add this for performance. OTOH, it's harmless
to add this. But I don't want to enlarge the "skip file list" for
pg_basebackup beyond necessary. So I removed this from the patch.

BTW, if we really want this, not only pg_basebackup but also pg_rewind
would need to skip the files.

+        * Remove any trigger file able to trigger promotion, this is safe from
+        * race conditions involving signals as no other processes are running
+        * yet.
+        */
+       RemoveStandbyTrigger();

I changed the function name to RemovePromoteSignalFiles().

I added more detail comment about the race condition that we're
talking about, into here.

Updated version of the patch attached.

Regards,

--
Fujii Masao

Attachment

pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master
Next
From: Michael Paquier
Date:
Subject: Re: BUG #13126: table constraint loses its comment