Re: For standby pg_ctl doesn't wait for PM_STATUS_READY in presence of promote_trigger_file - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: For standby pg_ctl doesn't wait for PM_STATUS_READY in presence of promote_trigger_file
Date
Msg-id 20200805.144623.2008802441758871789.horikyota.ntt@gmail.com
Whole thread Raw
In response to For standby pg_ctl doesn't wait for PM_STATUS_READY in presence of promote_trigger_file  (Ashwin Agrawal <ashwinstar@gmail.com>)
List pgsql-hackers
Hello.

At Tue, 4 Aug 2020 12:01:45 -0700, Ashwin Agrawal <ashwinstar@gmail.com> wrote in 
> If shutdown (non hot enabled) standby and promote the standby using
> promote_trigger_file via pg_ctl start with -w (wait), currently pg_ctl
> returns as soon as recovery is started. Instead would be helpful if
> pg_ctl can wait till PM_STATUS_READY for this case, given promotion is
> requested.
> 
> pg_ctl -w returns as soon as recovery is started for non hot enabled
> standby because PM_STATUS_STANDBY is written
> on PMSIGNAL_RECOVERY_STARTED. Given the intent to promote the standby
> using promote_trigger_file, it would be better to not write
> PM_STATUS_STANDBY, instead let promotion complete and return only
> after connections can be actually accepted.
> 
> Seems helpful behavior for users, though I am not sure about how much
> promote_trigger_file is used with non hot enabled standbys. This is
> something which will help to solidify some of the tests in Greenplum
> hence checking interest for the same here.
> 
> It's doable via below patch:

It is apparently strange that "pg_ctl start" waits for a server to
promote.  Is there any reason you use that way instead of pg_ctl start
then pg_ctl promote?

> diff --git a/src/backend/postmaster/postmaster.c
> b/src/backend/postmaster/postmaster.c
> index 5b5fc97c72..c49010aa5a 100644
> --- a/src/backend/postmaster/postmaster.c
> +++ b/src/backend/postmaster/postmaster.c
> @@ -5197,6 +5197,7 @@ sigusr1_handler(SIGNAL_ARGS)
>         if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED) &&
>                 pmState == PM_STARTUP && Shutdown == NoShutdown)
>         {
> +               bool promote_trigger_file_exist = false;
>                 /* WAL redo has started. We're out of reinitialization. */
>                 FatalError = false;
>                 AbortStartTime = 0;
> @@ -5218,12 +5219,25 @@ sigusr1_handler(SIGNAL_ARGS)
>                 if (XLogArchivingAlways())
>                         PgArchPID = pgarch_start();
> 
> +               {
> +                       /*
> +                        * if promote trigger file exist we don't wish to
> convey
> +                        * PM_STATUS_STANDBY, instead wish pg_ctl -w to
> wait till
> +                        * connections can be actually accepted by the
> database.
> +                        */
> +                       struct stat stat_buf;
> +                       if (PromoteTriggerFile != NULL &&
> +                               strcmp(PromoteTriggerFile, "") != 0 &&
> +                               stat(PromoteTriggerFile, &stat_buf) == 0)
> +                               promote_trigger_file_exist = true;
> +               }
> +
>                 /*
>                  * If we aren't planning to enter hot standby mode later,
> treat
>                  * RECOVERY_STARTED as meaning we're out of startup, and
> report status
>                  * accordingly.
>                  */
> -               if (!EnableHotStandby)
> +               if (!EnableHotStandby && !promote_trigger_file_exist)
>                 {
>                         AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS,
> PM_STATUS_STANDBY);
>  #ifdef USE_SYSTEMD

Addition the above, in regards to the patch, I'm not sure it's good
thing that postmaster process gets conscious of
PromoteTriggerFile.

Maybe we could change the behavior of "pg_ctl start" to wait for
consistecy point when archive recovery runs (slightly similarly to the
case of standbys) by adding a PM-signal, say,
PMSIGNAL_CONSISTENCY_REACHED?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: pg13dev: explain partial, parallel hashagg, and memory use
Next
From: Thomas Munro
Date:
Subject: Re: Handing off SLRU fsyncs to the checkpointer