Thread: For standby pg_ctl doesn't wait for PM_STATUS_READY in presence of promote_trigger_file

If shutdown (non hot enabled) standby and promote the standby using
promote_trigger_file via pg_ctl start with -w (wait), currently pg_ctl
returns as soon as recovery is started. Instead would be helpful if
pg_ctl can wait till PM_STATUS_READY for this case, given promotion is
requested.

pg_ctl -w returns as soon as recovery is started for non hot enabled
standby because PM_STATUS_STANDBY is written
on PMSIGNAL_RECOVERY_STARTED. Given the intent to promote the standby
using promote_trigger_file, it would be better to not write 
PM_STATUS_STANDBY, instead let promotion complete and return only
after connections can be actually accepted.

Seems helpful behavior for users, though I am not sure about how much
promote_trigger_file is used with non hot enabled standbys. This is
something which will help to solidify some of the tests in Greenplum
hence checking interest for the same here.

It's doable via below patch:

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 5b5fc97c72..c49010aa5a 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -5197,6 +5197,7 @@ sigusr1_handler(SIGNAL_ARGS)
        if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED) &&
                pmState == PM_STARTUP && Shutdown == NoShutdown)
        {
+               bool promote_trigger_file_exist = false;
                /* WAL redo has started. We're out of reinitialization. */
                FatalError = false;
                AbortStartTime = 0;
@@ -5218,12 +5219,25 @@ sigusr1_handler(SIGNAL_ARGS)
                if (XLogArchivingAlways())
                        PgArchPID = pgarch_start();
 
+               {
+                       /*
+                        * if promote trigger file exist we don't wish to convey
+                        * PM_STATUS_STANDBY, instead wish pg_ctl -w to wait till
+                        * connections can be actually accepted by the database.
+                        */
+                       struct stat stat_buf;
+                       if (PromoteTriggerFile != NULL &&
+                               strcmp(PromoteTriggerFile, "") != 0 &&
+                               stat(PromoteTriggerFile, &stat_buf) == 0)
+                               promote_trigger_file_exist = true;
+               }
+
                /*
                 * If we aren't planning to enter hot standby mode later, treat
                 * RECOVERY_STARTED as meaning we're out of startup, and report status
                 * accordingly.
                 */
-               if (!EnableHotStandby)
+               if (!EnableHotStandby && !promote_trigger_file_exist)
                {
                        AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STANDBY);
 #ifdef USE_SYSTEMD


--
Ashwin Agrawal (VMware)
Hello.

At Tue, 4 Aug 2020 12:01:45 -0700, Ashwin Agrawal <ashwinstar@gmail.com> wrote in 
> If shutdown (non hot enabled) standby and promote the standby using
> promote_trigger_file via pg_ctl start with -w (wait), currently pg_ctl
> returns as soon as recovery is started. Instead would be helpful if
> pg_ctl can wait till PM_STATUS_READY for this case, given promotion is
> requested.
> 
> pg_ctl -w returns as soon as recovery is started for non hot enabled
> standby because PM_STATUS_STANDBY is written
> on PMSIGNAL_RECOVERY_STARTED. Given the intent to promote the standby
> using promote_trigger_file, it would be better to not write
> PM_STATUS_STANDBY, instead let promotion complete and return only
> after connections can be actually accepted.
> 
> Seems helpful behavior for users, though I am not sure about how much
> promote_trigger_file is used with non hot enabled standbys. This is
> something which will help to solidify some of the tests in Greenplum
> hence checking interest for the same here.
> 
> It's doable via below patch:

It is apparently strange that "pg_ctl start" waits for a server to
promote.  Is there any reason you use that way instead of pg_ctl start
then pg_ctl promote?

> diff --git a/src/backend/postmaster/postmaster.c
> b/src/backend/postmaster/postmaster.c
> index 5b5fc97c72..c49010aa5a 100644
> --- a/src/backend/postmaster/postmaster.c
> +++ b/src/backend/postmaster/postmaster.c
> @@ -5197,6 +5197,7 @@ sigusr1_handler(SIGNAL_ARGS)
>         if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED) &&
>                 pmState == PM_STARTUP && Shutdown == NoShutdown)
>         {
> +               bool promote_trigger_file_exist = false;
>                 /* WAL redo has started. We're out of reinitialization. */
>                 FatalError = false;
>                 AbortStartTime = 0;
> @@ -5218,12 +5219,25 @@ sigusr1_handler(SIGNAL_ARGS)
>                 if (XLogArchivingAlways())
>                         PgArchPID = pgarch_start();
> 
> +               {
> +                       /*
> +                        * if promote trigger file exist we don't wish to
> convey
> +                        * PM_STATUS_STANDBY, instead wish pg_ctl -w to
> wait till
> +                        * connections can be actually accepted by the
> database.
> +                        */
> +                       struct stat stat_buf;
> +                       if (PromoteTriggerFile != NULL &&
> +                               strcmp(PromoteTriggerFile, "") != 0 &&
> +                               stat(PromoteTriggerFile, &stat_buf) == 0)
> +                               promote_trigger_file_exist = true;
> +               }
> +
>                 /*
>                  * If we aren't planning to enter hot standby mode later,
> treat
>                  * RECOVERY_STARTED as meaning we're out of startup, and
> report status
>                  * accordingly.
>                  */
> -               if (!EnableHotStandby)
> +               if (!EnableHotStandby && !promote_trigger_file_exist)
>                 {
>                         AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS,
> PM_STATUS_STANDBY);
>  #ifdef USE_SYSTEMD

Addition the above, in regards to the patch, I'm not sure it's good
thing that postmaster process gets conscious of
PromoteTriggerFile.

Maybe we could change the behavior of "pg_ctl start" to wait for
consistecy point when archive recovery runs (slightly similarly to the
case of standbys) by adding a PM-signal, say,
PMSIGNAL_CONSISTENCY_REACHED?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center