Thread: Parallel query hangs after a smart shutdown is issued

Parallel query hangs after a smart shutdown is issued

From

Bharath Rupireddy

Date:

12 August 2020, 15:32:35

Hi,

After a smart shutdown is issued(with pg_ctl), run a parallel query,
then the query hangs. The postmaster doesn't inform backends about the
smart shutdown(see pmdie()  ->  SIGTERM -> BACKEND_TYPE_NORMAL are not
informed), so if they request parallel workers, the postmaster is
unable to fork any workers as it's status(pmState) gets changed to
PM_WAIT_BACKENDS(see maybe_start_bgworkers() -->
bgworker_should_start_now() returns false).

Few ways we could solve this:
1. Do we want to disallow parallelism when there is a pending smart
shutdown? - If yes, then, we can let the postmaster know the regular
backends whenever a smart shutdown is received and the backends use
this info to not consider parallelism. If we use SIGTERM to notify,
since the backends have die() as handlers, they just cancel the
queries which is again an inconsistent behaviour[1]. Would any other
signal like SIGUSR2(I think it's currently ignored by backends) be
used here? If the signals are overloaded, can we multiplex SIGTERM
similar to SIGUSR1? If we don't want to use signals at all, the
postmaster can make an entry of it's status in bg worker shared memory
i.e. BackgroundWorkerData, RegisterDynamicBackgroundWorker() can
simply return, without requesting the postmaster for parallel workers.

2. If we want to allow parallelism, then, we can tweak
bgworker_should_start_now(), detect that the pending bg worker fork
requests are for parallelism, and let the postmaster start the
workers.

Thoughts?

Note: this issue is identified while working on [1]

[1] - https://www.postgresql.org/message-id/CALj2ACWTAQ2uWgj4yRFLQ6t15MMYV_uc3GCT5F5p8R9pzrd7yQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel query hangs after a smart shutdown is issued

From

Thomas Munro

Date:

12 August 2020, 16:56:03

On Thu, Aug 13, 2020 at 3:32 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> After a smart shutdown is issued(with pg_ctl), run a parallel query,
> then the query hangs. The postmaster doesn't inform backends about the
> smart shutdown(see pmdie()  ->  SIGTERM -> BACKEND_TYPE_NORMAL are not
> informed), so if they request parallel workers, the postmaster is
> unable to fork any workers as it's status(pmState) gets changed to
> PM_WAIT_BACKENDS(see maybe_start_bgworkers() -->
> bgworker_should_start_now() returns false).
>
> Few ways we could solve this:
> 1. Do we want to disallow parallelism when there is a pending smart
> shutdown? - If yes, then, we can let the postmaster know the regular
> backends whenever a smart shutdown is received and the backends use
> this info to not consider parallelism. If we use SIGTERM to notify,
> since the backends have die() as handlers, they just cancel the
> queries which is again an inconsistent behaviour[1]. Would any other
> signal like SIGUSR2(I think it's currently ignored by backends) be
> used here? If the signals are overloaded, can we multiplex SIGTERM
> similar to SIGUSR1? If we don't want to use signals at all, the
> postmaster can make an entry of it's status in bg worker shared memory
> i.e. BackgroundWorkerData, RegisterDynamicBackgroundWorker() can
> simply return, without requesting the postmaster for parallel workers.
>
> 2. If we want to allow parallelism, then, we can tweak
> bgworker_should_start_now(), detect that the pending bg worker fork
> requests are for parallelism, and let the postmaster start the
> workers.
>
> Thoughts?

Hello Bharath,

Yeah, the current situation is not good.  I think your option 2 sounds
better, because the documented behaviour of smart shutdown is that it
"lets existing sessions end their work normally".  I think that means
that a query that is already running or allowed to start should be
able to start new workers and not have its existing workers
terminated.  Arseny Sher wrote a couple of different patches to try
that last year, but they fell through the cracks:

https://www.postgresql.org/message-id/flat/CA%2BhUKGLrJij0BuFtHsMHT4QnLP54Z3S6vGVBCWR8A49%2BNzctCw%40mail.gmail.com

Re: Parallel query hangs after a smart shutdown is issued

From

Tom Lane

Date:

12 August 2020, 18:00:55

Thomas Munro <thomas.munro@gmail.com> writes:
> On Thu, Aug 13, 2020 at 3:32 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>> After a smart shutdown is issued(with pg_ctl), run a parallel query,
>> then the query hangs.

> Yeah, the current situation is not good.  I think your option 2 sounds
> better, because the documented behaviour of smart shutdown is that it
> "lets existing sessions end their work normally".  I think that means
> that a query that is already running or allowed to start should be
> able to start new workers and not have its existing workers
> terminated.  Arseny Sher wrote a couple of different patches to try
> that last year, but they fell through the cracks:
> https://www.postgresql.org/message-id/flat/CA%2BhUKGLrJij0BuFtHsMHT4QnLP54Z3S6vGVBCWR8A49%2BNzctCw%40mail.gmail.com

I already commented on this in the other thread that Bharath started [1].
I think the real issue here is why is the postmaster's SIGTERM handler
doing *anything* other than disallowing new connections?  It seems quite
premature to kill support processes of any sort, not only parallel
workers.  The documentation says existing clients are allowed to end
their work, not that their performance is going to be crippled until
they end.

So I looked at moving the kills of all the support processes to happen
after we detect that there are no remaining regular backends, and it
seems to not be too hard.  I realized that the existing PM_WAIT_READONLY
state is doing that already, but just for a subset of support processes
that it thinks might be active in hot standby mode.  So what I did in the
attached was to repurpose that state as "PM_WAIT_CLIENTS", which does the
right thing in either regular or hot standby mode.

One other thing I changed here was to remove PM_WAIT_READONLY from the
set of states in which we'll allow promotion to occur or a new walreceiver
to start.  I'm not convinced that either of those behaviors aren't
bugs; although if someone thinks they're right, we can certainly put
back PM_WAIT_CLIENTS in those checks.  (But, for example, it does not
appear possible to reach PM_WAIT_READONLY/PM_WAIT_CLIENTS state with
Shutdown == NoShutdown, so the test in MaybeStartWalReceiver sure looks
like confusingly dead code to me.  If we do want to allow restarting
the walreceiver in this state, the Shutdown condition needs fixed.)

            regards, tom lane

[1] https://www.postgresql.org/message-id/65189.1597181322%40sss.pgh.pa.us

diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index e31275a04e..3946fa52ea 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -185,8 +185,8 @@ PostgreSQL documentation
    <option>stop</option> mode shuts down the server that is running in
    the specified data directory.  Three different
    shutdown methods can be selected with the <option>-m</option>
-   option.  <quote>Smart</quote> mode waits for all active
-   clients to disconnect and any online backup to finish.
+   option.  <quote>Smart</quote> mode disallows new connections, then waits
+   for all existing clients to disconnect and any online backup to finish.
    If the server is in hot standby, recovery and streaming replication
    will be terminated once all clients have disconnected.
    <quote>Fast</quote> mode (the default) does not wait for clients to disconnect and
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 38e2c16ac2..790948a4b2 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -148,8 +148,6 @@
 #define BACKEND_TYPE_BGWORKER    0x0008    /* bgworker process */
 #define BACKEND_TYPE_ALL        0x000F    /* OR of all the above */

-#define BACKEND_TYPE_WORKER        (BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER)
-
 /*
  * List of active backends (or child processes anyway; we don't actually
  * know whether a given child has become a backend or is still in the
@@ -332,8 +330,8 @@ typedef enum
     PM_HOT_STANDBY,                /* in hot standby mode */
     PM_RUN,                        /* normal "database is alive" state */
     PM_WAIT_BACKUP,                /* waiting for online backup mode to end */
-    PM_WAIT_READONLY,            /* waiting for read only backends to exit */
-    PM_WAIT_BACKENDS,            /* waiting for live backends to exit */
+    PM_WAIT_CLIENTS,            /* waiting for normal backends to exit */
+    PM_WAIT_BACKENDS,            /* waiting for all backends to exit */
     PM_SHUTDOWN,                /* waiting for checkpointer to do shutdown
                                  * ckpt */
     PM_SHUTDOWN_2,                /* waiting for archiver and walsenders to
@@ -2793,35 +2791,19 @@ pmdie(SIGNAL_ARGS)
             sd_notify(0, "STOPPING=1");
 #endif

-            if (pmState == PM_RUN || pmState == PM_RECOVERY ||
-                pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
-            {
-                /* autovac workers are told to shut down immediately */
-                /* and bgworkers too; does this need tweaking? */
-                SignalSomeChildren(SIGTERM,
-                                   BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER);
-                /* and the autovac launcher too */
-                if (AutoVacPID != 0)
-                    signal_child(AutoVacPID, SIGTERM);
-                /* and the bgwriter too */
-                if (BgWriterPID != 0)
-                    signal_child(BgWriterPID, SIGTERM);
-                /* and the walwriter too */
-                if (WalWriterPID != 0)
-                    signal_child(WalWriterPID, SIGTERM);
-
-                /*
-                 * If we're in recovery, we can't kill the startup process
-                 * right away, because at present doing so does not release
-                 * its locks.  We might want to change this in a future
-                 * release.  For the time being, the PM_WAIT_READONLY state
-                 * indicates that we're waiting for the regular (read only)
-                 * backends to die off; once they do, we'll kill the startup
-                 * and walreceiver processes.
-                 */
-                pmState = (pmState == PM_RUN) ?
-                    PM_WAIT_BACKUP : PM_WAIT_READONLY;
-            }
+            /*
+             * If we reached normal running, we have to wait for any online
+             * backup mode to end; otherwise go straight to waiting for client
+             * backends to exit.  (The difference is that in the former state,
+             * we'll still let in new superuser clients, so that somebody can
+             * end the online backup mode.)  If already in PM_WAIT_BACKUP or a
+             * later state, do not change it.
+             */
+            if (pmState == PM_RUN)
+                pmState = PM_WAIT_BACKUP;
+            else if (pmState == PM_RECOVERY ||
+                     pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
+                pmState = PM_WAIT_CLIENTS;

             /*
              * Now wait for online backup mode to end and backends to exit. If
@@ -2871,16 +2853,15 @@ pmdie(SIGNAL_ARGS)
             }
             else if (pmState == PM_RUN ||
                      pmState == PM_WAIT_BACKUP ||
-                     pmState == PM_WAIT_READONLY ||
+                     pmState == PM_WAIT_CLIENTS ||
                      pmState == PM_WAIT_BACKENDS ||
                      pmState == PM_HOT_STANDBY)
             {
                 ereport(LOG,
                         (errmsg("aborting any active transactions")));
-                /* shut down all backends and workers */
+                /* shut down all backends and workers, but not walsenders */
                 SignalSomeChildren(SIGTERM,
-                                   BACKEND_TYPE_NORMAL | BACKEND_TYPE_AUTOVAC |
-                                   BACKEND_TYPE_BGWORKER);
+                                   BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND);
                 /* and the autovac launcher too */
                 if (AutoVacPID != 0)
                     signal_child(AutoVacPID, SIGTERM);
@@ -3713,7 +3694,7 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
         pmState == PM_HOT_STANDBY ||
         pmState == PM_RUN ||
         pmState == PM_WAIT_BACKUP ||
-        pmState == PM_WAIT_READONLY ||
+        pmState == PM_WAIT_CLIENTS ||
         pmState == PM_SHUTDOWN)
         pmState = PM_WAIT_BACKENDS;

@@ -3802,21 +3783,35 @@ PostmasterStateMachine(void)
          * PM_WAIT_BACKUP state ends when online backup mode is not active.
          */
         if (!BackupInProgress())
-            pmState = PM_WAIT_BACKENDS;
+            pmState = PM_WAIT_CLIENTS;
     }

-    if (pmState == PM_WAIT_READONLY)
+    if (pmState == PM_WAIT_CLIENTS)
     {
         /*
-         * PM_WAIT_READONLY state ends when we have no regular backends that
-         * have been started during recovery.  We kill the startup and
-         * walreceiver processes and transition to PM_WAIT_BACKENDS.  Ideally,
-         * we might like to kill these processes first and then wait for
-         * backends to die off, but that doesn't work at present because
-         * killing the startup process doesn't release its locks.
+         * PM_WAIT_CLIENTS state ends when we have no normal client backends
+         * running.  Then signal appropriate support processes, and transition
+         * to PM_WAIT_BACKENDS to wait for them to die.
          */
         if (CountChildren(BACKEND_TYPE_NORMAL) == 0)
         {
+            /*
+             * Signal all backend children except walsenders.  (While there
+             * can't be any normal children left, we might as well include
+             * BACKEND_TYPE_NORMAL in this mask, just to be sure.)
+             */
+            SignalSomeChildren(SIGTERM,
+                               BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND);
+            /* and the autovac launcher too */
+            if (AutoVacPID != 0)
+                signal_child(AutoVacPID, SIGTERM);
+            /* and the bgwriter too */
+            if (BgWriterPID != 0)
+                signal_child(BgWriterPID, SIGTERM);
+            /* and the walwriter too */
+            if (WalWriterPID != 0)
+                signal_child(WalWriterPID, SIGTERM);
+            /* If we're in recovery, also stop startup and walreceiver procs */
             if (StartupPID != 0)
                 signal_child(StartupPID, SIGTERM);
             if (WalReceiverPID != 0)
@@ -3843,7 +3838,7 @@ PostmasterStateMachine(void)
          * later after writing the checkpoint record, like the archiver
          * process.
          */
-        if (CountChildren(BACKEND_TYPE_NORMAL | BACKEND_TYPE_WORKER) == 0 &&
+        if (CountChildren(BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND) == 0 &&
             StartupPID == 0 &&
             WalReceiverPID == 0 &&
             BgWriterPID == 0 &&
@@ -5333,7 +5328,7 @@ sigusr1_handler(SIGNAL_ARGS)

     if (StartupPID != 0 &&
         (pmState == PM_STARTUP || pmState == PM_RECOVERY ||
-         pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
+         pmState == PM_HOT_STANDBY) &&
         CheckPromoteSignal())
     {
         /*
@@ -5651,7 +5646,7 @@ MaybeStartWalReceiver(void)
 {
     if (WalReceiverPID == 0 &&
         (pmState == PM_STARTUP || pmState == PM_RECOVERY ||
-         pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
+         pmState == PM_HOT_STANDBY) &&
         Shutdown == NoShutdown)
     {
         WalReceiverPID = StartWalReceiver();
@@ -5905,7 +5900,7 @@ bgworker_should_start_now(BgWorkerStartTime start_time)
         case PM_SHUTDOWN_2:
         case PM_SHUTDOWN:
         case PM_WAIT_BACKENDS:
-        case PM_WAIT_READONLY:
+        case PM_WAIT_CLIENTS:
         case PM_WAIT_BACKUP:
             break;

Re: Parallel query hangs after a smart shutdown is issued

From

Thomas Munro

Date:

12 August 2020, 18:54:38

On Thu, Aug 13, 2020 at 6:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > On Thu, Aug 13, 2020 at 3:32 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >> After a smart shutdown is issued(with pg_ctl), run a parallel query,
> >> then the query hangs.
>
> > Yeah, the current situation is not good.  I think your option 2 sounds
> > better, because the documented behaviour of smart shutdown is that it
> > "lets existing sessions end their work normally".  I think that means
> > that a query that is already running or allowed to start should be
> > able to start new workers and not have its existing workers
> > terminated.  Arseny Sher wrote a couple of different patches to try
> > that last year, but they fell through the cracks:
> >
https://www.postgresql.org/message-id/flat/CA%2BhUKGLrJij0BuFtHsMHT4QnLP54Z3S6vGVBCWR8A49%2BNzctCw%40mail.gmail.com
>
> I already commented on this in the other thread that Bharath started [1].
> I think the real issue here is why is the postmaster's SIGTERM handler
> doing *anything* other than disallowing new connections?  It seems quite
> premature to kill support processes of any sort, not only parallel
> workers.  The documentation says existing clients are allowed to end
> their work, not that their performance is going to be crippled until
> they end.

Right.  It's pretty strange that during smart shutdown, you could run
for hours with no autovacuum, walwriter, bgwriter.  I guess Arseny and
I were looking for a minimal change to fix a bug, but clearly there's a
more general problem and this change works out cleaner anyway.

> So I looked at moving the kills of all the support processes to happen
> after we detect that there are no remaining regular backends, and it
> seems to not be too hard.  I realized that the existing PM_WAIT_READONLY
> state is doing that already, but just for a subset of support processes
> that it thinks might be active in hot standby mode.  So what I did in the
> attached was to repurpose that state as "PM_WAIT_CLIENTS", which does the
> right thing in either regular or hot standby mode.

Make sense, works as expected and passes check-world.

> One other thing I changed here was to remove PM_WAIT_READONLY from the
> set of states in which we'll allow promotion to occur or a new walreceiver
> to start.  I'm not convinced that either of those behaviors aren't
> bugs; although if someone thinks they're right, we can certainly put
> back PM_WAIT_CLIENTS in those checks.  (But, for example, it does not
> appear possible to reach PM_WAIT_READONLY/PM_WAIT_CLIENTS state with
> Shutdown == NoShutdown, so the test in MaybeStartWalReceiver sure looks
> like confusingly dead code to me.  If we do want to allow restarting
> the walreceiver in this state, the Shutdown condition needs fixed.)

If a walreceiver is allowed to run, why should it not be allowed to
restart?  Yeah, I suppose that other test'd need to be Shutdown <=
SmartShutdown, just like we do in SIGHUP_handler().  Looking at other
places where we test Shutdown == NoShutdown, one that jumps out is the
autovacuum wraparound defence stuff and the nearby
PMSIGNAL_START_AUTOVAC_WORKER code.

Re: Parallel query hangs after a smart shutdown is issued

From

Tom Lane

Date:

12 August 2020, 19:28:26

Thomas Munro <thomas.munro@gmail.com> writes:
> On Thu, Aug 13, 2020 at 6:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> One other thing I changed here was to remove PM_WAIT_READONLY from the
>> set of states in which we'll allow promotion to occur or a new walreceiver
>> to start.  I'm not convinced that either of those behaviors aren't
>> bugs; although if someone thinks they're right, we can certainly put
>> back PM_WAIT_CLIENTS in those checks.  (But, for example, it does not
>> appear possible to reach PM_WAIT_READONLY/PM_WAIT_CLIENTS state with
>> Shutdown == NoShutdown, so the test in MaybeStartWalReceiver sure looks
>> like confusingly dead code to me.  If we do want to allow restarting
>> the walreceiver in this state, the Shutdown condition needs fixed.)

> If a walreceiver is allowed to run, why should it not be allowed to
> restart?

I'd come to about the same conclusion after thinking more, so v2
attached undoes that change.  I think putting off promotion is fine
though; it'll get handled at the next postmaster start.  (It looks
like the state machine would just proceed to exit anyway if we allowed
the promotion, but that's a hard-to-test state transition that we could
do without.)

> Yeah, I suppose that other test'd need to be Shutdown <=
> SmartShutdown, just like we do in SIGHUP_handler().  Looking at other
> places where we test Shutdown == NoShutdown, one that jumps out is the
> autovacuum wraparound defence stuff and the nearby
> PMSIGNAL_START_AUTOVAC_WORKER code.

Oh, excellent point!  I'd not thought to look at tests of the Shutdown
variable, but yeah, those should be <= SmartShutdown if we want autovac
to continue to operate in this state.

I also noticed that where reaper() is dealing with startup process
exit(3), it unconditionally sets Shutdown = SmartShutdown which seems
pretty bogus; that variable's value should never be allowed to decrease,
but this could cause it.  In the attached I did

                 StartupStatus = STARTUP_NOT_RUNNING;
-                Shutdown = SmartShutdown;
+                Shutdown = Max(Shutdown, SmartShutdown);
                 TerminateChildren(SIGTERM);

But given that it's forcing immediate termination of all backends,
I wonder if that's not more like a FastShutdown?  (Scary here is
that the coverage report shows we're not testing this path, so who
knows if it works at all.)

            regards, tom lane

diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index e31275a04e..3946fa52ea 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -185,8 +185,8 @@ PostgreSQL documentation
    <option>stop</option> mode shuts down the server that is running in
    the specified data directory.  Three different
    shutdown methods can be selected with the <option>-m</option>
-   option.  <quote>Smart</quote> mode waits for all active
-   clients to disconnect and any online backup to finish.
+   option.  <quote>Smart</quote> mode disallows new connections, then waits
+   for all existing clients to disconnect and any online backup to finish.
    If the server is in hot standby, recovery and streaming replication
    will be terminated once all clients have disconnected.
    <quote>Fast</quote> mode (the default) does not wait for clients to disconnect and
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 38e2c16ac2..e8ad4b67a3 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -148,8 +148,6 @@
 #define BACKEND_TYPE_BGWORKER    0x0008    /* bgworker process */
 #define BACKEND_TYPE_ALL        0x000F    /* OR of all the above */

-#define BACKEND_TYPE_WORKER        (BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER)
-
 /*
  * List of active backends (or child processes anyway; we don't actually
  * know whether a given child has become a backend or is still in the
@@ -319,7 +317,7 @@ static bool FatalError = false; /* T if recovering from backend crash */
  *
  * Notice that this state variable does not distinguish *why* we entered
  * states later than PM_RUN --- Shutdown and FatalError must be consulted
- * to find that out.  FatalError is never true in PM_RECOVERY_* or PM_RUN
+ * to find that out.  FatalError is never true in PM_RECOVERY or PM_RUN
  * states, nor in PM_SHUTDOWN states (because we don't enter those states
  * when trying to recover from a crash).  It can be true in PM_STARTUP state,
  * because we don't clear it until we've successfully started WAL redo.
@@ -332,8 +330,8 @@ typedef enum
     PM_HOT_STANDBY,                /* in hot standby mode */
     PM_RUN,                        /* normal "database is alive" state */
     PM_WAIT_BACKUP,                /* waiting for online backup mode to end */
-    PM_WAIT_READONLY,            /* waiting for read only backends to exit */
-    PM_WAIT_BACKENDS,            /* waiting for live backends to exit */
+    PM_WAIT_CLIENTS,            /* waiting for normal backends to exit */
+    PM_WAIT_BACKENDS,            /* waiting for all backends to exit */
     PM_SHUTDOWN,                /* waiting for checkpointer to do shutdown
                                  * ckpt */
     PM_SHUTDOWN_2,                /* waiting for archiver and walsenders to
@@ -2793,35 +2791,19 @@ pmdie(SIGNAL_ARGS)
             sd_notify(0, "STOPPING=1");
 #endif

-            if (pmState == PM_RUN || pmState == PM_RECOVERY ||
-                pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
-            {
-                /* autovac workers are told to shut down immediately */
-                /* and bgworkers too; does this need tweaking? */
-                SignalSomeChildren(SIGTERM,
-                                   BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER);
-                /* and the autovac launcher too */
-                if (AutoVacPID != 0)
-                    signal_child(AutoVacPID, SIGTERM);
-                /* and the bgwriter too */
-                if (BgWriterPID != 0)
-                    signal_child(BgWriterPID, SIGTERM);
-                /* and the walwriter too */
-                if (WalWriterPID != 0)
-                    signal_child(WalWriterPID, SIGTERM);
-
-                /*
-                 * If we're in recovery, we can't kill the startup process
-                 * right away, because at present doing so does not release
-                 * its locks.  We might want to change this in a future
-                 * release.  For the time being, the PM_WAIT_READONLY state
-                 * indicates that we're waiting for the regular (read only)
-                 * backends to die off; once they do, we'll kill the startup
-                 * and walreceiver processes.
-                 */
-                pmState = (pmState == PM_RUN) ?
-                    PM_WAIT_BACKUP : PM_WAIT_READONLY;
-            }
+            /*
+             * If we reached normal running, we have to wait for any online
+             * backup mode to end; otherwise go straight to waiting for client
+             * backends to exit.  (The difference is that in the former state,
+             * we'll still let in new superuser clients, so that somebody can
+             * end the online backup mode.)  If already in PM_WAIT_BACKUP or a
+             * later state, do not change it.
+             */
+            if (pmState == PM_RUN)
+                pmState = PM_WAIT_BACKUP;
+            else if (pmState == PM_RECOVERY ||
+                     pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
+                pmState = PM_WAIT_CLIENTS;

             /*
              * Now wait for online backup mode to end and backends to exit. If
@@ -2871,16 +2853,15 @@ pmdie(SIGNAL_ARGS)
             }
             else if (pmState == PM_RUN ||
                      pmState == PM_WAIT_BACKUP ||
-                     pmState == PM_WAIT_READONLY ||
+                     pmState == PM_WAIT_CLIENTS ||
                      pmState == PM_WAIT_BACKENDS ||
                      pmState == PM_HOT_STANDBY)
             {
                 ereport(LOG,
                         (errmsg("aborting any active transactions")));
-                /* shut down all backends and workers */
+                /* shut down all backends and workers, but not walsenders */
                 SignalSomeChildren(SIGTERM,
-                                   BACKEND_TYPE_NORMAL | BACKEND_TYPE_AUTOVAC |
-                                   BACKEND_TYPE_BGWORKER);
+                                   BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND);
                 /* and the autovac launcher too */
                 if (AutoVacPID != 0)
                     signal_child(AutoVacPID, SIGTERM);
@@ -2987,7 +2968,7 @@ reaper(SIGNAL_ARGS)
                 ereport(LOG,
                         (errmsg("shutdown at recovery target")));
                 StartupStatus = STARTUP_NOT_RUNNING;
-                Shutdown = SmartShutdown;
+                Shutdown = Max(Shutdown, SmartShutdown);
                 TerminateChildren(SIGTERM);
                 pmState = PM_WAIT_BACKENDS;
                 /* PostmasterStateMachine logic does the rest */
@@ -3713,7 +3694,7 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
         pmState == PM_HOT_STANDBY ||
         pmState == PM_RUN ||
         pmState == PM_WAIT_BACKUP ||
-        pmState == PM_WAIT_READONLY ||
+        pmState == PM_WAIT_CLIENTS ||
         pmState == PM_SHUTDOWN)
         pmState = PM_WAIT_BACKENDS;

@@ -3802,21 +3783,35 @@ PostmasterStateMachine(void)
          * PM_WAIT_BACKUP state ends when online backup mode is not active.
          */
         if (!BackupInProgress())
-            pmState = PM_WAIT_BACKENDS;
+            pmState = PM_WAIT_CLIENTS;
     }

-    if (pmState == PM_WAIT_READONLY)
+    if (pmState == PM_WAIT_CLIENTS)
     {
         /*
-         * PM_WAIT_READONLY state ends when we have no regular backends that
-         * have been started during recovery.  We kill the startup and
-         * walreceiver processes and transition to PM_WAIT_BACKENDS.  Ideally,
-         * we might like to kill these processes first and then wait for
-         * backends to die off, but that doesn't work at present because
-         * killing the startup process doesn't release its locks.
+         * PM_WAIT_CLIENTS state ends when we have no normal client backends
+         * running.  Then signal appropriate support processes, and transition
+         * to PM_WAIT_BACKENDS to wait for them to die.
          */
         if (CountChildren(BACKEND_TYPE_NORMAL) == 0)
         {
+            /*
+             * Signal all backend children except walsenders.  (While there
+             * can't be any normal children left, we might as well include
+             * BACKEND_TYPE_NORMAL in this mask, just to be sure.)
+             */
+            SignalSomeChildren(SIGTERM,
+                               BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND);
+            /* and the autovac launcher too */
+            if (AutoVacPID != 0)
+                signal_child(AutoVacPID, SIGTERM);
+            /* and the bgwriter too */
+            if (BgWriterPID != 0)
+                signal_child(BgWriterPID, SIGTERM);
+            /* and the walwriter too */
+            if (WalWriterPID != 0)
+                signal_child(WalWriterPID, SIGTERM);
+            /* If we're in recovery, also stop startup and walreceiver procs */
             if (StartupPID != 0)
                 signal_child(StartupPID, SIGTERM);
             if (WalReceiverPID != 0)
@@ -3843,7 +3838,7 @@ PostmasterStateMachine(void)
          * later after writing the checkpoint record, like the archiver
          * process.
          */
-        if (CountChildren(BACKEND_TYPE_NORMAL | BACKEND_TYPE_WORKER) == 0 &&
+        if (CountChildren(BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND) == 0 &&
             StartupPID == 0 &&
             WalReceiverPID == 0 &&
             BgWriterPID == 0 &&
@@ -5287,7 +5282,7 @@ sigusr1_handler(SIGNAL_ARGS)
     }

     if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER) &&
-        Shutdown == NoShutdown)
+        Shutdown <= SmartShutdown)
     {
         /*
          * Start one iteration of the autovacuum daemon, even if autovacuuming
@@ -5302,7 +5297,7 @@ sigusr1_handler(SIGNAL_ARGS)
     }

     if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER) &&
-        Shutdown == NoShutdown)
+        Shutdown <= SmartShutdown)
     {
         /* The autovacuum launcher wants us to start a worker process. */
         StartAutovacuumWorker();
@@ -5333,7 +5328,7 @@ sigusr1_handler(SIGNAL_ARGS)

     if (StartupPID != 0 &&
         (pmState == PM_STARTUP || pmState == PM_RECOVERY ||
-         pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
+         pmState == PM_HOT_STANDBY) &&
         CheckPromoteSignal())
     {
         /*
@@ -5651,8 +5646,8 @@ MaybeStartWalReceiver(void)
 {
     if (WalReceiverPID == 0 &&
         (pmState == PM_STARTUP || pmState == PM_RECOVERY ||
-         pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
-        Shutdown == NoShutdown)
+         pmState == PM_HOT_STANDBY || pmState == PM_WAIT_CLIENTS) &&
+        Shutdown <= SmartShutdown)
     {
         WalReceiverPID = StartWalReceiver();
         if (WalReceiverPID != 0)
@@ -5905,7 +5900,7 @@ bgworker_should_start_now(BgWorkerStartTime start_time)
         case PM_SHUTDOWN_2:
         case PM_SHUTDOWN:
         case PM_WAIT_BACKENDS:
-        case PM_WAIT_READONLY:
+        case PM_WAIT_CLIENTS:
         case PM_WAIT_BACKUP:
             break;

Re: Parallel query hangs after a smart shutdown is issued

From

Tom Lane

Date:

12 August 2020, 20:59:28

I wrote:
> Oh, excellent point!  I'd not thought to look at tests of the Shutdown
> variable, but yeah, those should be <= SmartShutdown if we want autovac
> to continue to operate in this state.

On looking closer, there's another problem: setting start_autovac_launcher
isn't enough to get the AV launcher to run, because ServerLoop() won't
launch it except in PM_RUN state.  Likewise, the other "relaunch a dead
process" checks in ServerLoop() need to be generalized to support
relaunching background processes while we're waiting out the foreground
clients.  So that leads me to the attached v3.  I had to re-instantiate
PM_WAIT_READONLY as an alternate state to PM_WAIT_CLIENTS; these states
are about the same so far as PostmasterStateMachine is concerned, but
some of the should-we-launch-FOO checks care about the difference.

The various pmState tests are getting messy enough to cry out for
refactorization, but I've not attempted that here.  There's enough
variance in the conditions for launching different subprocesses that
I'm not very sure what would be a nicer-looking way to write them.

            regards, tom lane

diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index e31275a04e..3946fa52ea 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -185,8 +185,8 @@ PostgreSQL documentation
    <option>stop</option> mode shuts down the server that is running in
    the specified data directory.  Three different
    shutdown methods can be selected with the <option>-m</option>
-   option.  <quote>Smart</quote> mode waits for all active
-   clients to disconnect and any online backup to finish.
+   option.  <quote>Smart</quote> mode disallows new connections, then waits
+   for all existing clients to disconnect and any online backup to finish.
    If the server is in hot standby, recovery and streaming replication
    will be terminated once all clients have disconnected.
    <quote>Fast</quote> mode (the default) does not wait for clients to disconnect and
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 38e2c16ac2..d134dade53 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -148,8 +148,6 @@
 #define BACKEND_TYPE_BGWORKER    0x0008    /* bgworker process */
 #define BACKEND_TYPE_ALL        0x000F    /* OR of all the above */

-#define BACKEND_TYPE_WORKER        (BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER)
-
 /*
  * List of active backends (or child processes anyway; we don't actually
  * know whether a given child has become a backend or is still in the
@@ -319,10 +317,10 @@ static bool FatalError = false; /* T if recovering from backend crash */
  *
  * Notice that this state variable does not distinguish *why* we entered
  * states later than PM_RUN --- Shutdown and FatalError must be consulted
- * to find that out.  FatalError is never true in PM_RECOVERY_* or PM_RUN
- * states, nor in PM_SHUTDOWN states (because we don't enter those states
- * when trying to recover from a crash).  It can be true in PM_STARTUP state,
- * because we don't clear it until we've successfully started WAL redo.
+ * to find that out.  FatalError is never true in PM_RECOVERY, PM_HOT_STANDBY,
+ * or PM_RUN states, nor in PM_SHUTDOWN states (because we don't enter those
+ * states when trying to recover from a crash).  It can be true in PM_STARTUP
+ * state, because we don't clear it until we've successfully started WAL redo.
  */
 typedef enum
 {
@@ -332,8 +330,9 @@ typedef enum
     PM_HOT_STANDBY,                /* in hot standby mode */
     PM_RUN,                        /* normal "database is alive" state */
     PM_WAIT_BACKUP,                /* waiting for online backup mode to end */
-    PM_WAIT_READONLY,            /* waiting for read only backends to exit */
-    PM_WAIT_BACKENDS,            /* waiting for live backends to exit */
+    PM_WAIT_CLIENTS,            /* waiting for normal backends to exit */
+    PM_WAIT_READONLY,            /* likewise, when we had been in a RO state */
+    PM_WAIT_BACKENDS,            /* waiting for all backends to exit */
     PM_SHUTDOWN,                /* waiting for checkpointer to do shutdown
                                  * ckpt */
     PM_SHUTDOWN_2,                /* waiting for archiver and walsenders to
@@ -437,9 +436,10 @@ static void InitPostmasterDeathWatchHandle(void);
  * even during recovery.
  */
 #define PgArchStartupAllowed()    \
-    ((XLogArchivingActive() && pmState == PM_RUN) ||    \
+    ((XLogArchivingActive() &&    \
+      (pmState == PM_RUN || pmState == PM_WAIT_BACKUP || pmState == PM_WAIT_CLIENTS)) ||    \
      (XLogArchivingAlways() &&    \
-      (pmState == PM_RECOVERY || pmState == PM_HOT_STANDBY)))
+      (pmState == PM_RECOVERY || pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY)))

 #ifdef EXEC_BACKEND

@@ -1750,7 +1750,8 @@ ServerLoop(void)
          * fails, we'll just try again later.  Likewise for the checkpointer.
          */
         if (pmState == PM_RUN || pmState == PM_RECOVERY ||
-            pmState == PM_HOT_STANDBY)
+            pmState == PM_HOT_STANDBY || pmState == PM_WAIT_BACKUP ||
+            pmState == PM_WAIT_CLIENTS || pmState == PM_WAIT_READONLY)
         {
             if (CheckpointerPID == 0)
                 CheckpointerPID = StartCheckpointer();
@@ -1763,7 +1764,8 @@ ServerLoop(void)
          * one.  But this is needed only in normal operation (else we cannot
          * be writing any new WAL).
          */
-        if (WalWriterPID == 0 && pmState == PM_RUN)
+        if (WalWriterPID == 0 &&
+            (pmState == PM_RUN || pmState == PM_WAIT_BACKUP || pmState == PM_WAIT_CLIENTS))
             WalWriterPID = StartWalWriter();

         /*
@@ -1774,7 +1776,7 @@ ServerLoop(void)
          */
         if (!IsBinaryUpgrade && AutoVacPID == 0 &&
             (AutoVacuumingActive() || start_autovac_launcher) &&
-            pmState == PM_RUN)
+            (pmState == PM_RUN || pmState == PM_WAIT_BACKUP || pmState == PM_WAIT_CLIENTS))
         {
             AutoVacPID = StartAutoVacLauncher();
             if (AutoVacPID != 0)
@@ -1783,7 +1785,9 @@ ServerLoop(void)

         /* If we have lost the stats collector, try to start a new one */
         if (PgStatPID == 0 &&
-            (pmState == PM_RUN || pmState == PM_HOT_STANDBY))
+            (pmState == PM_RUN || pmState == PM_HOT_STANDBY ||
+             pmState == PM_WAIT_BACKUP || pmState == PM_WAIT_CLIENTS ||
+             pmState == PM_WAIT_READONLY))
             PgStatPID = pgstat_start();

         /* If we have lost the archiver, try to start a new one. */
@@ -2793,35 +2797,19 @@ pmdie(SIGNAL_ARGS)
             sd_notify(0, "STOPPING=1");
 #endif

-            if (pmState == PM_RUN || pmState == PM_RECOVERY ||
-                pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
-            {
-                /* autovac workers are told to shut down immediately */
-                /* and bgworkers too; does this need tweaking? */
-                SignalSomeChildren(SIGTERM,
-                                   BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER);
-                /* and the autovac launcher too */
-                if (AutoVacPID != 0)
-                    signal_child(AutoVacPID, SIGTERM);
-                /* and the bgwriter too */
-                if (BgWriterPID != 0)
-                    signal_child(BgWriterPID, SIGTERM);
-                /* and the walwriter too */
-                if (WalWriterPID != 0)
-                    signal_child(WalWriterPID, SIGTERM);
-
-                /*
-                 * If we're in recovery, we can't kill the startup process
-                 * right away, because at present doing so does not release
-                 * its locks.  We might want to change this in a future
-                 * release.  For the time being, the PM_WAIT_READONLY state
-                 * indicates that we're waiting for the regular (read only)
-                 * backends to die off; once they do, we'll kill the startup
-                 * and walreceiver processes.
-                 */
-                pmState = (pmState == PM_RUN) ?
-                    PM_WAIT_BACKUP : PM_WAIT_READONLY;
-            }
+            /*
+             * If we reached normal running, we have to wait for any online
+             * backup mode to end; otherwise go straight to waiting for client
+             * backends to exit.  (The difference is that in the former state,
+             * we'll still let in new superuser clients, so that somebody can
+             * end the online backup mode.)  If already in PM_WAIT_BACKUP or a
+             * later state, do not change it.
+             */
+            if (pmState == PM_RUN)
+                pmState = PM_WAIT_BACKUP;
+            else if (pmState == PM_RECOVERY ||
+                     pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
+                pmState = PM_WAIT_READONLY;

             /*
              * Now wait for online backup mode to end and backends to exit. If
@@ -2871,16 +2859,16 @@ pmdie(SIGNAL_ARGS)
             }
             else if (pmState == PM_RUN ||
                      pmState == PM_WAIT_BACKUP ||
+                     pmState == PM_WAIT_CLIENTS ||
                      pmState == PM_WAIT_READONLY ||
                      pmState == PM_WAIT_BACKENDS ||
                      pmState == PM_HOT_STANDBY)
             {
                 ereport(LOG,
                         (errmsg("aborting any active transactions")));
-                /* shut down all backends and workers */
+                /* shut down all backends and workers, but not walsenders */
                 SignalSomeChildren(SIGTERM,
-                                   BACKEND_TYPE_NORMAL | BACKEND_TYPE_AUTOVAC |
-                                   BACKEND_TYPE_BGWORKER);
+                                   BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND);
                 /* and the autovac launcher too */
                 if (AutoVacPID != 0)
                     signal_child(AutoVacPID, SIGTERM);
@@ -2987,7 +2975,7 @@ reaper(SIGNAL_ARGS)
                 ereport(LOG,
                         (errmsg("shutdown at recovery target")));
                 StartupStatus = STARTUP_NOT_RUNNING;
-                Shutdown = SmartShutdown;
+                Shutdown = Max(Shutdown, SmartShutdown);
                 TerminateChildren(SIGTERM);
                 pmState = PM_WAIT_BACKENDS;
                 /* PostmasterStateMachine logic does the rest */
@@ -3234,7 +3222,9 @@ reaper(SIGNAL_ARGS)
             if (!EXIT_STATUS_0(exitstatus))
                 LogChildExit(LOG, _("statistics collector process"),
                              pid, exitstatus);
-            if (pmState == PM_RUN || pmState == PM_HOT_STANDBY)
+            if (pmState == PM_RUN || pmState == PM_HOT_STANDBY ||
+                pmState == PM_WAIT_BACKUP || pmState == PM_WAIT_CLIENTS ||
+                pmState == PM_WAIT_READONLY)
                 PgStatPID = pgstat_start();
             continue;
         }
@@ -3713,6 +3703,7 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
         pmState == PM_HOT_STANDBY ||
         pmState == PM_RUN ||
         pmState == PM_WAIT_BACKUP ||
+        pmState == PM_WAIT_CLIENTS ||
         pmState == PM_WAIT_READONLY ||
         pmState == PM_SHUTDOWN)
         pmState = PM_WAIT_BACKENDS;
@@ -3802,21 +3793,36 @@ PostmasterStateMachine(void)
          * PM_WAIT_BACKUP state ends when online backup mode is not active.
          */
         if (!BackupInProgress())
-            pmState = PM_WAIT_BACKENDS;
+            pmState = PM_WAIT_CLIENTS;
     }

-    if (pmState == PM_WAIT_READONLY)
+    if (pmState == PM_WAIT_CLIENTS || pmState == PM_WAIT_READONLY)
     {
         /*
-         * PM_WAIT_READONLY state ends when we have no regular backends that
-         * have been started during recovery.  We kill the startup and
-         * walreceiver processes and transition to PM_WAIT_BACKENDS.  Ideally,
-         * we might like to kill these processes first and then wait for
-         * backends to die off, but that doesn't work at present because
-         * killing the startup process doesn't release its locks.
+         * PM_WAIT_CLIENTS or PM_WAIT_READONLY state ends when we have no
+         * normal client backends running.  Then signal appropriate support
+         * processes, and transition to PM_WAIT_BACKENDS to wait for them to
+         * die.
          */
         if (CountChildren(BACKEND_TYPE_NORMAL) == 0)
         {
+            /*
+             * Signal all backend children except walsenders.  (While there
+             * can't be any normal children left, we might as well include
+             * BACKEND_TYPE_NORMAL in this mask, just to be sure.)
+             */
+            SignalSomeChildren(SIGTERM,
+                               BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND);
+            /* and the autovac launcher too */
+            if (AutoVacPID != 0)
+                signal_child(AutoVacPID, SIGTERM);
+            /* and the bgwriter too */
+            if (BgWriterPID != 0)
+                signal_child(BgWriterPID, SIGTERM);
+            /* and the walwriter too */
+            if (WalWriterPID != 0)
+                signal_child(WalWriterPID, SIGTERM);
+            /* If we're in recovery, also stop startup and walreceiver procs */
             if (StartupPID != 0)
                 signal_child(StartupPID, SIGTERM);
             if (WalReceiverPID != 0)
@@ -3843,7 +3849,7 @@ PostmasterStateMachine(void)
          * later after writing the checkpoint record, like the archiver
          * process.
          */
-        if (CountChildren(BACKEND_TYPE_NORMAL | BACKEND_TYPE_WORKER) == 0 &&
+        if (CountChildren(BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND) == 0 &&
             StartupPID == 0 &&
             WalReceiverPID == 0 &&
             BgWriterPID == 0 &&
@@ -5287,7 +5293,7 @@ sigusr1_handler(SIGNAL_ARGS)
     }

     if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER) &&
-        Shutdown == NoShutdown)
+        Shutdown <= SmartShutdown && pmState < PM_WAIT_BACKENDS)
     {
         /*
          * Start one iteration of the autovacuum daemon, even if autovacuuming
@@ -5302,7 +5308,7 @@ sigusr1_handler(SIGNAL_ARGS)
     }

     if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER) &&
-        Shutdown == NoShutdown)
+        Shutdown <= SmartShutdown && pmState < PM_WAIT_BACKENDS)
     {
         /* The autovacuum launcher wants us to start a worker process. */
         StartAutovacuumWorker();
@@ -5333,7 +5339,7 @@ sigusr1_handler(SIGNAL_ARGS)

     if (StartupPID != 0 &&
         (pmState == PM_STARTUP || pmState == PM_RECOVERY ||
-         pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
+         pmState == PM_HOT_STANDBY) &&
         CheckPromoteSignal())
     {
         /*
@@ -5652,7 +5658,7 @@ MaybeStartWalReceiver(void)
     if (WalReceiverPID == 0 &&
         (pmState == PM_STARTUP || pmState == PM_RECOVERY ||
          pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
-        Shutdown == NoShutdown)
+        Shutdown <= SmartShutdown)
     {
         WalReceiverPID = StartWalReceiver();
         if (WalReceiverPID != 0)
@@ -5906,6 +5912,7 @@ bgworker_should_start_now(BgWorkerStartTime start_time)
         case PM_SHUTDOWN:
         case PM_WAIT_BACKENDS:
         case PM_WAIT_READONLY:
+        case PM_WAIT_CLIENTS:
         case PM_WAIT_BACKUP:
             break;

Re: Parallel query hangs after a smart shutdown is issued

From

Thomas Munro

Date:

12 August 2020, 21:41:42

On Thu, Aug 13, 2020 at 8:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > Oh, excellent point!  I'd not thought to look at tests of the Shutdown
> > variable, but yeah, those should be <= SmartShutdown if we want autovac
> > to continue to operate in this state.
>
> On looking closer, there's another problem: setting start_autovac_launcher
> isn't enough to get the AV launcher to run, because ServerLoop() won't
> launch it except in PM_RUN state.  Likewise, the other "relaunch a dead
> process" checks in ServerLoop() need to be generalized to support
> relaunching background processes while we're waiting out the foreground
> clients.  So that leads me to the attached v3.  I had to re-instantiate
> PM_WAIT_READONLY as an alternate state to PM_WAIT_CLIENTS; these states
> are about the same so far as PostmasterStateMachine is concerned, but
> some of the should-we-launch-FOO checks care about the difference.

I think we also need:

@@ -2459,6 +2459,9 @@ canAcceptConnections(int backend_type)
        {
                if (pmState == PM_WAIT_BACKUP)
                        result = CAC_WAITBACKUP;        /* allow
superusers only */
+               else if (Shutdown <= SmartShutdown &&
+                                backend_type == BACKEND_TYPE_AUTOVAC)
+                       result = CAC_OK;
                else if (Shutdown > NoShutdown)
                        return CAC_SHUTDOWN;    /* shutdown is pending */
                else if (!FatalError &&


Retesting the original complaint, I think we need:

@@ -5911,11 +5912,11 @@ bgworker_should_start_now(BgWorkerStartTime start_time)
                case PM_SHUTDOWN_2:
                case PM_SHUTDOWN:
                case PM_WAIT_BACKENDS:
-               case PM_WAIT_READONLY:
-               case PM_WAIT_CLIENTS:
                case PM_WAIT_BACKUP:
                        break;

+               case PM_WAIT_READONLY:
+               case PM_WAIT_CLIENTS:
                case PM_RUN:
                        if (start_time == BgWorkerStart_RecoveryFinished)
                                return true;

Re: Parallel query hangs after a smart shutdown is issued

From

Tom Lane

Date:

12 August 2020, 22:21:32

Thomas Munro <thomas.munro@gmail.com> writes:
> I think we also need:

> +               else if (Shutdown <= SmartShutdown &&
> +                                backend_type == BACKEND_TYPE_AUTOVAC)
> +                       result = CAC_OK;

Hm, ok.

> Retesting the original complaint, I think we need:

> @@ -5911,11 +5912,11 @@ bgworker_should_start_now(BgWorkerStartTime start_time)
> +               case PM_WAIT_READONLY:
> +               case PM_WAIT_CLIENTS:
>                 case PM_RUN:

So the question here is whether time-based bgworkers should be allowed to
restart in this scenario.  I'm not quite sure --- depending on what the
bgworker's purpose is, you could make an argument either way, I think.
Do we need some way to control that?

In any case, we'd want to treat PM_WAIT_READONLY like PM_HOT_STANDBY not
PM_RUN, no?  Also, the state before PM_WAIT_READONLY could have been
PM_RECOVERY or PM_STARTUP, in which case we don't really want to think
it's like PM_HOT_STANDBY either; only the BgWorkerStart_PostmasterStart
case should be accepted.  That suggests that we need yet another pmState,
or else a more thoroughgoing refactoring of how the postmaster's state
is represented.

            regards, tom lane

Re: Parallel query hangs after a smart shutdown is issued

From

Thomas Munro

Date:

12 August 2020, 23:27:27

On Thu, Aug 13, 2020 at 10:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > @@ -5911,11 +5912,11 @@ bgworker_should_start_now(BgWorkerStartTime start_time)
> > +               case PM_WAIT_READONLY:
> > +               case PM_WAIT_CLIENTS:
> >                 case PM_RUN:
>
> So the question here is whether time-based bgworkers should be allowed to
> restart in this scenario.  I'm not quite sure --- depending on what the
> bgworker's purpose is, you could make an argument either way, I think.
> Do we need some way to control that?

I'm not sure why any bgworker would actually want to be shut down or
not restarted during the twilight zone of a smart shutdown though --
if users can do arbitrary stuff, why can't supporting workers carry
on?  For example, a hypothetical extension that triggers vacuum freeze
at smarter times, or a wait event sampling extension, an FDW that uses
an extra worker to maintain a connection to something, etc etc could
all be things that a user is indirectly relying on to do their normal
work, and I struggle to think of an example of something that you
explicitly don't want running just because (in some sense) the server
*plans* to shut down, when the users get around to logging off.  But
maybe I lack imagination.

> In any case, we'd want to treat PM_WAIT_READONLY like PM_HOT_STANDBY not
> PM_RUN, no?

Yeah, you're right.

> Also, the state before PM_WAIT_READONLY could have been
> PM_RECOVERY or PM_STARTUP, in which case we don't really want to think
> it's like PM_HOT_STANDBY either; only the BgWorkerStart_PostmasterStart
> case should be accepted.  That suggests that we need yet another pmState,
> or else a more thoroughgoing refactoring of how the postmaster's state
> is represented.

Hmm.

Re: Parallel query hangs after a smart shutdown is issued

From

Tom Lane

Date:

13 August 2020, 02:37:21

Thomas Munro <thomas.munro@gmail.com> writes:
> On Thu, Aug 13, 2020 at 10:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Also, the state before PM_WAIT_READONLY could have been
>> PM_RECOVERY or PM_STARTUP, in which case we don't really want to think
>> it's like PM_HOT_STANDBY either; only the BgWorkerStart_PostmasterStart
>> case should be accepted.  That suggests that we need yet another pmState,
>> or else a more thoroughgoing refactoring of how the postmaster's state
>> is represented.

> Hmm.

I experimented with separating the shutdown-in-progress state into a
separate variable, letting us actually reduce not increase the number of
pmStates.  This way, PM_RUN and other states still apply until we're
ready to pull the shutdown trigger, so that we don't need to complicate
state-based decisions about launching auxiliary processes.  This patch
also unifies the signal-sending for the smart and fast shutdown paths,
which seems like a nice improvement.  I kind of like this, though I'm not
in love with the particular variable name I used here (smartShutState).

If we go this way, CAC_WAITBACKUP ought to be renamed since the PMState
it's named after no longer exists.  I left that alone pending making
final naming choices, though.

            regards, tom lane

diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index e31275a04e..3946fa52ea 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -185,8 +185,8 @@ PostgreSQL documentation
    <option>stop</option> mode shuts down the server that is running in
    the specified data directory.  Three different
    shutdown methods can be selected with the <option>-m</option>
-   option.  <quote>Smart</quote> mode waits for all active
-   clients to disconnect and any online backup to finish.
+   option.  <quote>Smart</quote> mode disallows new connections, then waits
+   for all existing clients to disconnect and any online backup to finish.
    If the server is in hot standby, recovery and streaming replication
    will be terminated once all clients have disconnected.
    <quote>Fast</quote> mode (the default) does not wait for clients to disconnect and
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 38e2c16ac2..66b402e7f7 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -148,8 +148,6 @@
 #define BACKEND_TYPE_BGWORKER    0x0008    /* bgworker process */
 #define BACKEND_TYPE_ALL        0x000F    /* OR of all the above */

-#define BACKEND_TYPE_WORKER        (BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER)
-
 /*
  * List of active backends (or child processes anyway; we don't actually
  * know whether a given child has become a backend or is still in the
@@ -304,8 +302,7 @@ static bool FatalError = false; /* T if recovering from backend crash */
  * and we switch to PM_RUN state.
  *
  * Normal child backends can only be launched when we are in PM_RUN or
- * PM_HOT_STANDBY state.  (We also allow launch of normal
- * child backends in PM_WAIT_BACKUP state, but only for superusers.)
+ * PM_HOT_STANDBY state.  (smartShutState can also restrict launching.)
  * In other states we handle connection requests by launching "dead_end"
  * child processes, which will simply send the client an error message and
  * quit.  (We track these in the BackendList so that we can know when they
@@ -319,10 +316,10 @@ static bool FatalError = false; /* T if recovering from backend crash */
  *
  * Notice that this state variable does not distinguish *why* we entered
  * states later than PM_RUN --- Shutdown and FatalError must be consulted
- * to find that out.  FatalError is never true in PM_RECOVERY_* or PM_RUN
- * states, nor in PM_SHUTDOWN states (because we don't enter those states
- * when trying to recover from a crash).  It can be true in PM_STARTUP state,
- * because we don't clear it until we've successfully started WAL redo.
+ * to find that out.  FatalError is never true in PM_RECOVERY, PM_HOT_STANDBY,
+ * or PM_RUN states, nor in PM_SHUTDOWN states (because we don't enter those
+ * states when trying to recover from a crash).  It can be true in PM_STARTUP
+ * state, because we don't clear it until we've successfully started WAL redo.
  */
 typedef enum
 {
@@ -331,8 +328,7 @@ typedef enum
     PM_RECOVERY,                /* in archive recovery mode */
     PM_HOT_STANDBY,                /* in hot standby mode */
     PM_RUN,                        /* normal "database is alive" state */
-    PM_WAIT_BACKUP,                /* waiting for online backup mode to end */
-    PM_WAIT_READONLY,            /* waiting for read only backends to exit */
+    PM_STOP_BACKENDS,            /* need to stop remaining backends */
     PM_WAIT_BACKENDS,            /* waiting for live backends to exit */
     PM_SHUTDOWN,                /* waiting for checkpointer to do shutdown
                                  * ckpt */
@@ -344,6 +340,21 @@ typedef enum

 static PMState pmState = PM_INIT;

+/*
+ * While performing a "smart shutdown", we restrict new connections but stay
+ * in PM_RUN or PM_HOT_STANDBY state until all the client backends are gone.
+ * smartShutState is a sub-state indicator showing the active restriction.
+ * It must always be SMART_NORMAL_USAGE when pmState isn't RUN or HOT_STANDBY.
+ */
+typedef enum
+{
+    SMART_NORMAL_USAGE,            /* normal not-shutting-down state */
+    SMART_SUPERUSER_ONLY,        /* only superusers can connect */
+    SMART_NO_CONNECTIONS        /* no new connections allowed, period */
+} SmartShutState;
+
+static SmartShutState smartShutState = SMART_NORMAL_USAGE;
+
 /* Start time of SIGKILL timeout during immediate shutdown or child crash */
 /* Zero means timeout is not running */
 static time_t AbortStartTime = 0;
@@ -1396,6 +1407,7 @@ PostmasterMain(int argc, char *argv[])
     Assert(StartupPID != 0);
     StartupStatus = STARTUP_RUNNING;
     pmState = PM_STARTUP;
+    smartShutState = SMART_NORMAL_USAGE;

     /* Some workers may be scheduled to start now */
     maybe_start_bgworkers();
@@ -2441,21 +2453,27 @@ canAcceptConnections(int backend_type)
     /*
      * Can't start backends when in startup/shutdown/inconsistent recovery
      * state.  We treat autovac workers the same as user backends for this
-     * purpose.  However, bgworkers are excluded from this test; we expect
+     * purpose.  However, bgworkers are excluded from these tests; we expect
      * bgworker_should_start_now() decided whether the DB state allows them.
      *
-     * In state PM_WAIT_BACKUP only superusers can connect (this must be
-     * allowed so that a superuser can end online backup mode); we return
-     * CAC_WAITBACKUP code to indicate that this must be checked later. Note
-     * that neither CAC_OK nor CAC_WAITBACKUP can safely be returned until we
-     * have checked for too many children.
+     * If smartShutState == SMART_SUPERUSER_ONLY then only superusers can
+     * connect (this must be allowed so that a superuser can end online backup
+     * mode); we return CAC_WAITBACKUP code to indicate that this must be
+     * checked later.  Note that neither CAC_OK nor CAC_WAITBACKUP can safely
+     * be returned until we have checked for too many children.
      */
-    if (pmState != PM_RUN &&
+    if (smartShutState != SMART_NORMAL_USAGE &&
         backend_type != BACKEND_TYPE_BGWORKER)
     {
-        if (pmState == PM_WAIT_BACKUP)
+        if (smartShutState == SMART_SUPERUSER_ONLY)
             result = CAC_WAITBACKUP;    /* allow superusers only */
-        else if (Shutdown > NoShutdown)
+        else
+            return CAC_SHUTDOWN;    /* shutdown is pending */
+    }
+    if (pmState != PM_RUN &&
+        backend_type != BACKEND_TYPE_BGWORKER)
+    {
+        if (Shutdown > NoShutdown)
             return CAC_SHUTDOWN;    /* shutdown is pending */
         else if (!FatalError &&
                  (pmState == PM_STARTUP ||
@@ -2793,34 +2811,23 @@ pmdie(SIGNAL_ARGS)
             sd_notify(0, "STOPPING=1");
 #endif

-            if (pmState == PM_RUN || pmState == PM_RECOVERY ||
-                pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
+            /*
+             * If we reached normal running, we have to wait for any online
+             * backup mode to end; otherwise go straight to waiting for client
+             * backends to exit.  (The difference is that in the former state,
+             * we'll still let in new superuser clients, so that somebody can
+             * end the online backup mode.)  If already in PM_STOP_BACKENDS or
+             * a later state, do not change it.
+             */
+            if (pmState == PM_RUN)
+                smartShutState = SMART_SUPERUSER_ONLY;
+            else if (pmState == PM_HOT_STANDBY)
+                smartShutState = SMART_NO_CONNECTIONS;
+            else if (pmState == PM_STARTUP || pmState == PM_RECOVERY)
             {
-                /* autovac workers are told to shut down immediately */
-                /* and bgworkers too; does this need tweaking? */
-                SignalSomeChildren(SIGTERM,
-                                   BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER);
-                /* and the autovac launcher too */
-                if (AutoVacPID != 0)
-                    signal_child(AutoVacPID, SIGTERM);
-                /* and the bgwriter too */
-                if (BgWriterPID != 0)
-                    signal_child(BgWriterPID, SIGTERM);
-                /* and the walwriter too */
-                if (WalWriterPID != 0)
-                    signal_child(WalWriterPID, SIGTERM);
-
-                /*
-                 * If we're in recovery, we can't kill the startup process
-                 * right away, because at present doing so does not release
-                 * its locks.  We might want to change this in a future
-                 * release.  For the time being, the PM_WAIT_READONLY state
-                 * indicates that we're waiting for the regular (read only)
-                 * backends to die off; once they do, we'll kill the startup
-                 * and walreceiver processes.
-                 */
-                pmState = (pmState == PM_RUN) ?
-                    PM_WAIT_BACKUP : PM_WAIT_READONLY;
+                /* There should be no clients, so proceed to stop children */
+                pmState = PM_STOP_BACKENDS;
+                smartShutState = SMART_NORMAL_USAGE;
             }

             /*
@@ -2851,48 +2858,25 @@ pmdie(SIGNAL_ARGS)
             sd_notify(0, "STOPPING=1");
 #endif

-            if (StartupPID != 0)
-                signal_child(StartupPID, SIGTERM);
-            if (BgWriterPID != 0)
-                signal_child(BgWriterPID, SIGTERM);
-            if (WalReceiverPID != 0)
-                signal_child(WalReceiverPID, SIGTERM);
             if (pmState == PM_STARTUP || pmState == PM_RECOVERY)
             {
-                SignalSomeChildren(SIGTERM, BACKEND_TYPE_BGWORKER);
-
-                /*
-                 * Only startup, bgwriter, walreceiver, possibly bgworkers,
-                 * and/or checkpointer should be active in this state; we just
-                 * signaled the first four, and we don't want to kill
-                 * checkpointer yet.
-                 */
-                pmState = PM_WAIT_BACKENDS;
+                /* Just shut down background processes silently */
+                pmState = PM_STOP_BACKENDS;
+                smartShutState = SMART_NORMAL_USAGE;
             }
             else if (pmState == PM_RUN ||
-                     pmState == PM_WAIT_BACKUP ||
-                     pmState == PM_WAIT_READONLY ||
-                     pmState == PM_WAIT_BACKENDS ||
                      pmState == PM_HOT_STANDBY)
             {
+                /* Report that we're about to zap live client sessions */
                 ereport(LOG,
                         (errmsg("aborting any active transactions")));
-                /* shut down all backends and workers */
-                SignalSomeChildren(SIGTERM,
-                                   BACKEND_TYPE_NORMAL | BACKEND_TYPE_AUTOVAC |
-                                   BACKEND_TYPE_BGWORKER);
-                /* and the autovac launcher too */
-                if (AutoVacPID != 0)
-                    signal_child(AutoVacPID, SIGTERM);
-                /* and the walwriter too */
-                if (WalWriterPID != 0)
-                    signal_child(WalWriterPID, SIGTERM);
-                pmState = PM_WAIT_BACKENDS;
+                pmState = PM_STOP_BACKENDS;
+                smartShutState = SMART_NORMAL_USAGE;
             }

             /*
-             * Now wait for backends to exit.  If there are none,
-             * PostmasterStateMachine will take the next step.
+             * PostmasterStateMachine will issue any necessary signals, or
+             * take the next step if no child processes need to be killed.
              */
             PostmasterStateMachine();
             break;
@@ -2920,6 +2904,7 @@ pmdie(SIGNAL_ARGS)

             TerminateChildren(SIGQUIT);
             pmState = PM_WAIT_BACKENDS;
+            smartShutState = SMART_NORMAL_USAGE;

             /* set stopwatch for them to die */
             AbortStartTime = time(NULL);
@@ -2978,6 +2963,7 @@ reaper(SIGNAL_ARGS)
             {
                 StartupStatus = STARTUP_NOT_RUNNING;
                 pmState = PM_WAIT_BACKENDS;
+                smartShutState = SMART_NORMAL_USAGE;
                 /* PostmasterStateMachine logic does the rest */
                 continue;
             }
@@ -2987,9 +2973,10 @@ reaper(SIGNAL_ARGS)
                 ereport(LOG,
                         (errmsg("shutdown at recovery target")));
                 StartupStatus = STARTUP_NOT_RUNNING;
-                Shutdown = SmartShutdown;
+                Shutdown = Max(Shutdown, SmartShutdown);
                 TerminateChildren(SIGTERM);
                 pmState = PM_WAIT_BACKENDS;
+                smartShutState = SMART_NORMAL_USAGE;
                 /* PostmasterStateMachine logic does the rest */
                 continue;
             }
@@ -3051,6 +3038,7 @@ reaper(SIGNAL_ARGS)
             AbortStartTime = 0;
             ReachedNormalRunning = true;
             pmState = PM_RUN;
+            smartShutState = SMART_NORMAL_USAGE;

             /*
              * Crank up the background tasks, if we didn't do that already
@@ -3712,10 +3700,12 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
     if (pmState == PM_RECOVERY ||
         pmState == PM_HOT_STANDBY ||
         pmState == PM_RUN ||
-        pmState == PM_WAIT_BACKUP ||
-        pmState == PM_WAIT_READONLY ||
+        pmState == PM_STOP_BACKENDS ||
         pmState == PM_SHUTDOWN)
+    {
         pmState = PM_WAIT_BACKENDS;
+        smartShutState = SMART_NORMAL_USAGE;
+    }

     /*
      * .. and if this doesn't happen quickly enough, now the clock is ticking
@@ -3796,35 +3786,59 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 static void
 PostmasterStateMachine(void)
 {
-    if (pmState == PM_WAIT_BACKUP)
+    if (smartShutState == SMART_SUPERUSER_ONLY)
     {
         /*
-         * PM_WAIT_BACKUP state ends when online backup mode is not active.
+         * SMART_SUPERUSER_ONLY state ends as soon as online backup mode is
+         * not active.
          */
         if (!BackupInProgress())
-            pmState = PM_WAIT_BACKENDS;
+            smartShutState = SMART_NO_CONNECTIONS;
     }

-    if (pmState == PM_WAIT_READONLY)
+    if (smartShutState == SMART_NO_CONNECTIONS)
     {
         /*
-         * PM_WAIT_READONLY state ends when we have no regular backends that
-         * have been started during recovery.  We kill the startup and
-         * walreceiver processes and transition to PM_WAIT_BACKENDS.  Ideally,
-         * we might like to kill these processes first and then wait for
-         * backends to die off, but that doesn't work at present because
-         * killing the startup process doesn't release its locks.
+         * SMART_NO_CONNECTIONS state ends when we have no normal client
+         * backends running.  Then we're ready to shut down other children.
          */
         if (CountChildren(BACKEND_TYPE_NORMAL) == 0)
         {
-            if (StartupPID != 0)
-                signal_child(StartupPID, SIGTERM);
-            if (WalReceiverPID != 0)
-                signal_child(WalReceiverPID, SIGTERM);
-            pmState = PM_WAIT_BACKENDS;
+            pmState = PM_STOP_BACKENDS;
+            smartShutState = SMART_NORMAL_USAGE;
         }
     }

+    /*
+     * If we're ready to do so, signal child processes to shut down.  (This
+     * isn't a persistent state, but treating it as a distinct pmState allows
+     * us to share this code across multiple shutdown paths.)
+     */
+    if (pmState == PM_STOP_BACKENDS)
+    {
+        /* Signal all backend children except walsenders */
+        SignalSomeChildren(SIGTERM,
+                           BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND);
+        /* and the autovac launcher too */
+        if (AutoVacPID != 0)
+            signal_child(AutoVacPID, SIGTERM);
+        /* and the bgwriter too */
+        if (BgWriterPID != 0)
+            signal_child(BgWriterPID, SIGTERM);
+        /* and the walwriter too */
+        if (WalWriterPID != 0)
+            signal_child(WalWriterPID, SIGTERM);
+        /* If we're in recovery, also stop startup and walreceiver procs */
+        if (StartupPID != 0)
+            signal_child(StartupPID, SIGTERM);
+        if (WalReceiverPID != 0)
+            signal_child(WalReceiverPID, SIGTERM);
+        /* checkpointer, archiver, stats, and syslogger may continue for now */
+
+        /* Now transition to PM_WAIT_BACKENDS state to wait for them to die */
+        pmState = PM_WAIT_BACKENDS;
+    }
+
     /*
      * If we are in a state-machine state that implies waiting for backends to
      * exit, see if they're all gone, and change state if so.
@@ -3843,7 +3857,7 @@ PostmasterStateMachine(void)
          * later after writing the checkpoint record, like the archiver
          * process.
          */
-        if (CountChildren(BACKEND_TYPE_NORMAL | BACKEND_TYPE_WORKER) == 0 &&
+        if (CountChildren(BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND) == 0 &&
             StartupPID == 0 &&
             WalReceiverPID == 0 &&
             BgWriterPID == 0 &&
@@ -5287,7 +5301,7 @@ sigusr1_handler(SIGNAL_ARGS)
     }

     if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER) &&
-        Shutdown == NoShutdown)
+        Shutdown <= SmartShutdown && pmState < PM_STOP_BACKENDS)
     {
         /*
          * Start one iteration of the autovacuum daemon, even if autovacuuming
@@ -5302,7 +5316,7 @@ sigusr1_handler(SIGNAL_ARGS)
     }

     if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER) &&
-        Shutdown == NoShutdown)
+        Shutdown <= SmartShutdown && pmState < PM_STOP_BACKENDS)
     {
         /* The autovacuum launcher wants us to start a worker process. */
         StartAutovacuumWorker();
@@ -5333,7 +5347,7 @@ sigusr1_handler(SIGNAL_ARGS)

     if (StartupPID != 0 &&
         (pmState == PM_STARTUP || pmState == PM_RECOVERY ||
-         pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
+         pmState == PM_HOT_STANDBY) &&
         CheckPromoteSignal())
     {
         /*
@@ -5651,8 +5665,8 @@ MaybeStartWalReceiver(void)
 {
     if (WalReceiverPID == 0 &&
         (pmState == PM_STARTUP || pmState == PM_RECOVERY ||
-         pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
-        Shutdown == NoShutdown)
+         pmState == PM_HOT_STANDBY) &&
+        Shutdown <= SmartShutdown)
     {
         WalReceiverPID = StartWalReceiver();
         if (WalReceiverPID != 0)
@@ -5905,8 +5919,7 @@ bgworker_should_start_now(BgWorkerStartTime start_time)
         case PM_SHUTDOWN_2:
         case PM_SHUTDOWN:
         case PM_WAIT_BACKENDS:
-        case PM_WAIT_READONLY:
-        case PM_WAIT_BACKUP:
+        case PM_STOP_BACKENDS:
             break;

         case PM_RUN:

Re: Parallel query hangs after a smart shutdown is issued

From

Thomas Munro

Date:

13 August 2020, 04:42:49

On Thu, Aug 13, 2020 at 2:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I experimented with separating the shutdown-in-progress state into a
> separate variable, letting us actually reduce not increase the number of
> pmStates.  This way, PM_RUN and other states still apply until we're
> ready to pull the shutdown trigger, so that we don't need to complicate
> state-based decisions about launching auxiliary processes.  This patch
> also unifies the signal-sending for the smart and fast shutdown paths,
> which seems like a nice improvement.  I kind of like this, though I'm not
> in love with the particular variable name I used here (smartShutState).

Makes sense.  I tested this version on a primary and a replica and
verified that parallel workers launch, but I saw that autovacuum
workers still can't start without something like this:

@@ -2463,7 +2463,8 @@ canAcceptConnections(int backend_type)
         * be returned until we have checked for too many children.
         */
        if (smartShutState != SMART_NORMAL_USAGE &&
-               backend_type != BACKEND_TYPE_BGWORKER)
+               backend_type != BACKEND_TYPE_BGWORKER &&
+               backend_type != BACKEND_TYPE_AUTOVAC)
        {
                if (smartShutState == SMART_SUPERUSER_ONLY)
                        result = CAC_WAITBACKUP;        /* allow
superusers only */
@@ -2471,7 +2472,8 @@ canAcceptConnections(int backend_type)
                        return CAC_SHUTDOWN;    /* shutdown is pending */
        }
        if (pmState != PM_RUN &&
-               backend_type != BACKEND_TYPE_BGWORKER)
+               backend_type != BACKEND_TYPE_BGWORKER &&
+               backend_type != BACKEND_TYPE_AUTOVAC)
        {
                if (Shutdown > NoShutdown)
                        return CAC_SHUTDOWN;    /* shutdown is pending */

Re: Parallel query hangs after a smart shutdown is issued

From

Tom Lane

Date:

13 August 2020, 04:58:02

Thomas Munro <thomas.munro@gmail.com> writes:
> Makes sense.  I tested this version on a primary and a replica and
> verified that parallel workers launch, but I saw that autovacuum
> workers still can't start without something like this:

> @@ -2463,7 +2463,8 @@ canAcceptConnections(int backend_type)
>          * be returned until we have checked for too many children.
>          */
>         if (smartShutState != SMART_NORMAL_USAGE &&
> -               backend_type != BACKEND_TYPE_BGWORKER)
> +               backend_type != BACKEND_TYPE_BGWORKER &&
> +               backend_type != BACKEND_TYPE_AUTOVAC)

Hmmm ... maybe that should be more like

        if (smartShutState != SMART_NORMAL_USAGE &&
            backend_type == BACKEND_TYPE_NORMAL)

(the adjacent comment needs adjustment too of course).

            regards, tom lane

Re: Parallel query hangs after a smart shutdown is issued

From

Tom Lane

Date:

13 August 2020, 16:45:08

I wrote:
> Hmmm ... maybe that should be more like
>         if (smartShutState != SMART_NORMAL_USAGE &&
>             backend_type == BACKEND_TYPE_NORMAL)

After some more rethinking and testing, here's a v5 that feels
fairly final to me.  I realized that the logic in canAcceptConnections
was kind of backwards: it's better to check the main pmState restrictions
first and then the smart-shutdown restrictions afterwards.

I'm assuming we want to back-patch this as far as 9.6, where parallel
query began to be a thing.

            regards, tom lane

diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index e31275a04e..3946fa52ea 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -185,8 +185,8 @@ PostgreSQL documentation
    <option>stop</option> mode shuts down the server that is running in
    the specified data directory.  Three different
    shutdown methods can be selected with the <option>-m</option>
-   option.  <quote>Smart</quote> mode waits for all active
-   clients to disconnect and any online backup to finish.
+   option.  <quote>Smart</quote> mode disallows new connections, then waits
+   for all existing clients to disconnect and any online backup to finish.
    If the server is in hot standby, recovery and streaming replication
    will be terminated once all clients have disconnected.
    <quote>Fast</quote> mode (the default) does not wait for clients to disconnect and
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 38e2c16ac2..8a179cd691 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -148,8 +148,6 @@
 #define BACKEND_TYPE_BGWORKER    0x0008    /* bgworker process */
 #define BACKEND_TYPE_ALL        0x000F    /* OR of all the above */

-#define BACKEND_TYPE_WORKER        (BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER)
-
 /*
  * List of active backends (or child processes anyway; we don't actually
  * know whether a given child has become a backend or is still in the
@@ -304,8 +302,7 @@ static bool FatalError = false; /* T if recovering from backend crash */
  * and we switch to PM_RUN state.
  *
  * Normal child backends can only be launched when we are in PM_RUN or
- * PM_HOT_STANDBY state.  (We also allow launch of normal
- * child backends in PM_WAIT_BACKUP state, but only for superusers.)
+ * PM_HOT_STANDBY state.  (connsAllowed can also restrict launching.)
  * In other states we handle connection requests by launching "dead_end"
  * child processes, which will simply send the client an error message and
  * quit.  (We track these in the BackendList so that we can know when they
@@ -319,10 +316,10 @@ static bool FatalError = false; /* T if recovering from backend crash */
  *
  * Notice that this state variable does not distinguish *why* we entered
  * states later than PM_RUN --- Shutdown and FatalError must be consulted
- * to find that out.  FatalError is never true in PM_RECOVERY_* or PM_RUN
- * states, nor in PM_SHUTDOWN states (because we don't enter those states
- * when trying to recover from a crash).  It can be true in PM_STARTUP state,
- * because we don't clear it until we've successfully started WAL redo.
+ * to find that out.  FatalError is never true in PM_RECOVERY, PM_HOT_STANDBY,
+ * or PM_RUN states, nor in PM_SHUTDOWN states (because we don't enter those
+ * states when trying to recover from a crash).  It can be true in PM_STARTUP
+ * state, because we don't clear it until we've successfully started WAL redo.
  */
 typedef enum
 {
@@ -331,8 +328,7 @@ typedef enum
     PM_RECOVERY,                /* in archive recovery mode */
     PM_HOT_STANDBY,                /* in hot standby mode */
     PM_RUN,                        /* normal "database is alive" state */
-    PM_WAIT_BACKUP,                /* waiting for online backup mode to end */
-    PM_WAIT_READONLY,            /* waiting for read only backends to exit */
+    PM_STOP_BACKENDS,            /* need to stop remaining backends */
     PM_WAIT_BACKENDS,            /* waiting for live backends to exit */
     PM_SHUTDOWN,                /* waiting for checkpointer to do shutdown
                                  * ckpt */
@@ -344,6 +340,21 @@ typedef enum

 static PMState pmState = PM_INIT;

+/*
+ * While performing a "smart shutdown", we restrict new connections but stay
+ * in PM_RUN or PM_HOT_STANDBY state until all the client backends are gone.
+ * connsAllowed is a sub-state indicator showing the active restriction.
+ * It is of no interest unless pmState is PM_RUN or PM_HOT_STANDBY.
+ */
+typedef enum
+{
+    ALLOW_ALL_CONNS,            /* normal not-shutting-down state */
+    ALLOW_SUPERUSER_CONNS,        /* only superusers can connect */
+    ALLOW_NO_CONNS                /* no new connections allowed, period */
+} ConnsAllowedState;
+
+static ConnsAllowedState connsAllowed = ALLOW_ALL_CONNS;
+
 /* Start time of SIGKILL timeout during immediate shutdown or child crash */
 /* Zero means timeout is not running */
 static time_t AbortStartTime = 0;
@@ -2323,7 +2334,7 @@ retry1:
                     (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
                      errmsg("sorry, too many clients already")));
             break;
-        case CAC_WAITBACKUP:
+        case CAC_SUPERUSER:
             /* OK for now, will check in InitPostgres */
             break;
         case CAC_OK:
@@ -2443,31 +2454,36 @@ canAcceptConnections(int backend_type)
      * state.  We treat autovac workers the same as user backends for this
      * purpose.  However, bgworkers are excluded from this test; we expect
      * bgworker_should_start_now() decided whether the DB state allows them.
-     *
-     * In state PM_WAIT_BACKUP only superusers can connect (this must be
-     * allowed so that a superuser can end online backup mode); we return
-     * CAC_WAITBACKUP code to indicate that this must be checked later. Note
-     * that neither CAC_OK nor CAC_WAITBACKUP can safely be returned until we
-     * have checked for too many children.
      */
-    if (pmState != PM_RUN &&
+    if (pmState != PM_RUN && pmState != PM_HOT_STANDBY &&
         backend_type != BACKEND_TYPE_BGWORKER)
     {
-        if (pmState == PM_WAIT_BACKUP)
-            result = CAC_WAITBACKUP;    /* allow superusers only */
-        else if (Shutdown > NoShutdown)
+        if (Shutdown > NoShutdown)
             return CAC_SHUTDOWN;    /* shutdown is pending */
         else if (!FatalError &&
                  (pmState == PM_STARTUP ||
                   pmState == PM_RECOVERY))
             return CAC_STARTUP; /* normal startup */
-        else if (!FatalError &&
-                 pmState == PM_HOT_STANDBY)
-            result = CAC_OK;    /* connection OK during hot standby */
         else
             return CAC_RECOVERY;    /* else must be crash recovery */
     }

+    /*
+     * "Smart shutdown" restrictions are applied only to normal connections,
+     * not to autovac workers or bgworkers.  When only superusers can connect,
+     * we return CAC_SUPERUSER to indicate that superuserness must be checked
+     * later.  Note that neither CAC_OK nor CAC_SUPERUSER can safely be
+     * returned until we have checked for too many children.
+     */
+    if (connsAllowed != ALLOW_ALL_CONNS &&
+        backend_type == BACKEND_TYPE_NORMAL)
+    {
+        if (connsAllowed == ALLOW_SUPERUSER_CONNS)
+            result = CAC_SUPERUSER; /* allow superusers only */
+        else
+            return CAC_SHUTDOWN;    /* shutdown is pending */
+    }
+
     /*
      * Don't start too many children.
      *
@@ -2793,34 +2809,22 @@ pmdie(SIGNAL_ARGS)
             sd_notify(0, "STOPPING=1");
 #endif

-            if (pmState == PM_RUN || pmState == PM_RECOVERY ||
-                pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
+            /*
+             * If we reached normal running, we have to wait for any online
+             * backup mode to end; otherwise go straight to waiting for client
+             * backends to exit.  (The difference is that in the former state,
+             * we'll still let in new superuser clients, so that somebody can
+             * end the online backup mode.)  If already in PM_STOP_BACKENDS or
+             * a later state, do not change it.
+             */
+            if (pmState == PM_RUN)
+                connsAllowed = ALLOW_SUPERUSER_CONNS;
+            else if (pmState == PM_HOT_STANDBY)
+                connsAllowed = ALLOW_NO_CONNS;
+            else if (pmState == PM_STARTUP || pmState == PM_RECOVERY)
             {
-                /* autovac workers are told to shut down immediately */
-                /* and bgworkers too; does this need tweaking? */
-                SignalSomeChildren(SIGTERM,
-                                   BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER);
-                /* and the autovac launcher too */
-                if (AutoVacPID != 0)
-                    signal_child(AutoVacPID, SIGTERM);
-                /* and the bgwriter too */
-                if (BgWriterPID != 0)
-                    signal_child(BgWriterPID, SIGTERM);
-                /* and the walwriter too */
-                if (WalWriterPID != 0)
-                    signal_child(WalWriterPID, SIGTERM);
-
-                /*
-                 * If we're in recovery, we can't kill the startup process
-                 * right away, because at present doing so does not release
-                 * its locks.  We might want to change this in a future
-                 * release.  For the time being, the PM_WAIT_READONLY state
-                 * indicates that we're waiting for the regular (read only)
-                 * backends to die off; once they do, we'll kill the startup
-                 * and walreceiver processes.
-                 */
-                pmState = (pmState == PM_RUN) ?
-                    PM_WAIT_BACKUP : PM_WAIT_READONLY;
+                /* There should be no clients, so proceed to stop children */
+                pmState = PM_STOP_BACKENDS;
             }

             /*
@@ -2851,48 +2855,23 @@ pmdie(SIGNAL_ARGS)
             sd_notify(0, "STOPPING=1");
 #endif

-            if (StartupPID != 0)
-                signal_child(StartupPID, SIGTERM);
-            if (BgWriterPID != 0)
-                signal_child(BgWriterPID, SIGTERM);
-            if (WalReceiverPID != 0)
-                signal_child(WalReceiverPID, SIGTERM);
             if (pmState == PM_STARTUP || pmState == PM_RECOVERY)
             {
-                SignalSomeChildren(SIGTERM, BACKEND_TYPE_BGWORKER);
-
-                /*
-                 * Only startup, bgwriter, walreceiver, possibly bgworkers,
-                 * and/or checkpointer should be active in this state; we just
-                 * signaled the first four, and we don't want to kill
-                 * checkpointer yet.
-                 */
-                pmState = PM_WAIT_BACKENDS;
+                /* Just shut down background processes silently */
+                pmState = PM_STOP_BACKENDS;
             }
             else if (pmState == PM_RUN ||
-                     pmState == PM_WAIT_BACKUP ||
-                     pmState == PM_WAIT_READONLY ||
-                     pmState == PM_WAIT_BACKENDS ||
                      pmState == PM_HOT_STANDBY)
             {
+                /* Report that we're about to zap live client sessions */
                 ereport(LOG,
                         (errmsg("aborting any active transactions")));
-                /* shut down all backends and workers */
-                SignalSomeChildren(SIGTERM,
-                                   BACKEND_TYPE_NORMAL | BACKEND_TYPE_AUTOVAC |
-                                   BACKEND_TYPE_BGWORKER);
-                /* and the autovac launcher too */
-                if (AutoVacPID != 0)
-                    signal_child(AutoVacPID, SIGTERM);
-                /* and the walwriter too */
-                if (WalWriterPID != 0)
-                    signal_child(WalWriterPID, SIGTERM);
-                pmState = PM_WAIT_BACKENDS;
+                pmState = PM_STOP_BACKENDS;
             }

             /*
-             * Now wait for backends to exit.  If there are none,
-             * PostmasterStateMachine will take the next step.
+             * PostmasterStateMachine will issue any necessary signals, or
+             * take the next step if no child processes need to be killed.
              */
             PostmasterStateMachine();
             break;
@@ -2987,7 +2966,7 @@ reaper(SIGNAL_ARGS)
                 ereport(LOG,
                         (errmsg("shutdown at recovery target")));
                 StartupStatus = STARTUP_NOT_RUNNING;
-                Shutdown = SmartShutdown;
+                Shutdown = Max(Shutdown, SmartShutdown);
                 TerminateChildren(SIGTERM);
                 pmState = PM_WAIT_BACKENDS;
                 /* PostmasterStateMachine logic does the rest */
@@ -3051,6 +3030,7 @@ reaper(SIGNAL_ARGS)
             AbortStartTime = 0;
             ReachedNormalRunning = true;
             pmState = PM_RUN;
+            connsAllowed = ALLOW_ALL_CONNS;

             /*
              * Crank up the background tasks, if we didn't do that already
@@ -3712,8 +3692,7 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
     if (pmState == PM_RECOVERY ||
         pmState == PM_HOT_STANDBY ||
         pmState == PM_RUN ||
-        pmState == PM_WAIT_BACKUP ||
-        pmState == PM_WAIT_READONLY ||
+        pmState == PM_STOP_BACKENDS ||
         pmState == PM_SHUTDOWN)
         pmState = PM_WAIT_BACKENDS;

@@ -3796,35 +3775,60 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 static void
 PostmasterStateMachine(void)
 {
-    if (pmState == PM_WAIT_BACKUP)
+    /* If we're doing a smart shutdown, try to advance that state. */
+    if (pmState == PM_RUN || pmState == PM_HOT_STANDBY)
     {
-        /*
-         * PM_WAIT_BACKUP state ends when online backup mode is not active.
-         */
-        if (!BackupInProgress())
-            pmState = PM_WAIT_BACKENDS;
-    }
+        if (connsAllowed == ALLOW_SUPERUSER_CONNS)
+        {
+            /*
+             * ALLOW_SUPERUSER_CONNS state ends as soon as online backup mode
+             * is not active.
+             */
+            if (!BackupInProgress())
+                connsAllowed = ALLOW_NO_CONNS;
+        }

-    if (pmState == PM_WAIT_READONLY)
-    {
-        /*
-         * PM_WAIT_READONLY state ends when we have no regular backends that
-         * have been started during recovery.  We kill the startup and
-         * walreceiver processes and transition to PM_WAIT_BACKENDS.  Ideally,
-         * we might like to kill these processes first and then wait for
-         * backends to die off, but that doesn't work at present because
-         * killing the startup process doesn't release its locks.
-         */
-        if (CountChildren(BACKEND_TYPE_NORMAL) == 0)
+        if (connsAllowed == ALLOW_NO_CONNS)
         {
-            if (StartupPID != 0)
-                signal_child(StartupPID, SIGTERM);
-            if (WalReceiverPID != 0)
-                signal_child(WalReceiverPID, SIGTERM);
-            pmState = PM_WAIT_BACKENDS;
+            /*
+             * ALLOW_NO_CONNS state ends when we have no normal client
+             * backends running.  Then we're ready to stop other children.
+             */
+            if (CountChildren(BACKEND_TYPE_NORMAL) == 0)
+                pmState = PM_STOP_BACKENDS;
         }
     }

+    /*
+     * If we're ready to do so, signal child processes to shut down.  (This
+     * isn't a persistent state, but treating it as a distinct pmState allows
+     * us to share this code across multiple shutdown paths.)
+     */
+    if (pmState == PM_STOP_BACKENDS)
+    {
+        /* Signal all backend children except walsenders */
+        SignalSomeChildren(SIGTERM,
+                           BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND);
+        /* and the autovac launcher too */
+        if (AutoVacPID != 0)
+            signal_child(AutoVacPID, SIGTERM);
+        /* and the bgwriter too */
+        if (BgWriterPID != 0)
+            signal_child(BgWriterPID, SIGTERM);
+        /* and the walwriter too */
+        if (WalWriterPID != 0)
+            signal_child(WalWriterPID, SIGTERM);
+        /* If we're in recovery, also stop startup and walreceiver procs */
+        if (StartupPID != 0)
+            signal_child(StartupPID, SIGTERM);
+        if (WalReceiverPID != 0)
+            signal_child(WalReceiverPID, SIGTERM);
+        /* checkpointer, archiver, stats, and syslogger may continue for now */
+
+        /* Now transition to PM_WAIT_BACKENDS state to wait for them to die */
+        pmState = PM_WAIT_BACKENDS;
+    }
+
     /*
      * If we are in a state-machine state that implies waiting for backends to
      * exit, see if they're all gone, and change state if so.
@@ -3843,7 +3847,7 @@ PostmasterStateMachine(void)
          * later after writing the checkpoint record, like the archiver
          * process.
          */
-        if (CountChildren(BACKEND_TYPE_NORMAL | BACKEND_TYPE_WORKER) == 0 &&
+        if (CountChildren(BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND) == 0 &&
             StartupPID == 0 &&
             WalReceiverPID == 0 &&
             BgWriterPID == 0 &&
@@ -4184,7 +4188,7 @@ BackendStartup(Port *port)
     /* Pass down canAcceptConnections state */
     port->canAcceptConnections = canAcceptConnections(BACKEND_TYPE_NORMAL);
     bn->dead_end = (port->canAcceptConnections != CAC_OK &&
-                    port->canAcceptConnections != CAC_WAITBACKUP);
+                    port->canAcceptConnections != CAC_SUPERUSER);

     /*
      * Unless it's a dead_end child, assign it a child slot number
@@ -5255,6 +5259,8 @@ sigusr1_handler(SIGNAL_ARGS)
 #endif

         pmState = PM_HOT_STANDBY;
+        connsAllowed = ALLOW_ALL_CONNS;
+
         /* Some workers may be scheduled to start now */
         StartWorkerNeeded = true;
     }
@@ -5287,7 +5293,7 @@ sigusr1_handler(SIGNAL_ARGS)
     }

     if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER) &&
-        Shutdown == NoShutdown)
+        Shutdown <= SmartShutdown && pmState < PM_STOP_BACKENDS)
     {
         /*
          * Start one iteration of the autovacuum daemon, even if autovacuuming
@@ -5302,7 +5308,7 @@ sigusr1_handler(SIGNAL_ARGS)
     }

     if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER) &&
-        Shutdown == NoShutdown)
+        Shutdown <= SmartShutdown && pmState < PM_STOP_BACKENDS)
     {
         /* The autovacuum launcher wants us to start a worker process. */
         StartAutovacuumWorker();
@@ -5333,7 +5339,7 @@ sigusr1_handler(SIGNAL_ARGS)

     if (StartupPID != 0 &&
         (pmState == PM_STARTUP || pmState == PM_RECOVERY ||
-         pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
+         pmState == PM_HOT_STANDBY) &&
         CheckPromoteSignal())
     {
         /*
@@ -5651,8 +5657,8 @@ MaybeStartWalReceiver(void)
 {
     if (WalReceiverPID == 0 &&
         (pmState == PM_STARTUP || pmState == PM_RECOVERY ||
-         pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
-        Shutdown == NoShutdown)
+         pmState == PM_HOT_STANDBY) &&
+        Shutdown <= SmartShutdown)
     {
         WalReceiverPID = StartWalReceiver();
         if (WalReceiverPID != 0)
@@ -5905,8 +5911,7 @@ bgworker_should_start_now(BgWorkerStartTime start_time)
         case PM_SHUTDOWN_2:
         case PM_SHUTDOWN:
         case PM_WAIT_BACKENDS:
-        case PM_WAIT_READONLY:
-        case PM_WAIT_BACKUP:
+        case PM_STOP_BACKENDS:
             break;

         case PM_RUN:
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 893be2f3dd..d4ab4c7e23 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -795,7 +795,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
      */
     if ((!am_superuser || am_walsender) &&
         MyProcPort != NULL &&
-        MyProcPort->canAcceptConnections == CAC_WAITBACKUP)
+        MyProcPort->canAcceptConnections == CAC_SUPERUSER)
     {
         if (am_walsender)
             ereport(FATAL,
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 179ebaa104..0a23281ad5 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -71,7 +71,7 @@ typedef struct
 typedef enum CAC_state
 {
     CAC_OK, CAC_STARTUP, CAC_SHUTDOWN, CAC_RECOVERY, CAC_TOOMANY,
-    CAC_WAITBACKUP
+    CAC_SUPERUSER
 } CAC_state;

Re: Parallel query hangs after a smart shutdown is issued

From

Thomas Munro

Date:

14 August 2020, 06:03:47

On Fri, Aug 14, 2020 at 4:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > Hmmm ... maybe that should be more like
> >         if (smartShutState != SMART_NORMAL_USAGE &&
> >             backend_type == BACKEND_TYPE_NORMAL)
>
> After some more rethinking and testing, here's a v5 that feels
> fairly final to me.  I realized that the logic in canAcceptConnections
> was kind of backwards: it's better to check the main pmState restrictions
> first and then the smart-shutdown restrictions afterwards.

LGTM.  I tested this a bit today and it did what I expected for
parallel queries and vacuum, on primary and standby.

> I'm assuming we want to back-patch this as far as 9.6, where parallel
> query began to be a thing.

Yeah.  I mean, it's more radical than what I thought we'd be doing for
this, but you could get into real trouble by running in smart shutdown
mode without the autovac infrastructure alive.

Re: Parallel query hangs after a smart shutdown is issued

From

Tom Lane

Date:

14 August 2020, 14:32:15

Thomas Munro <thomas.munro@gmail.com> writes:
> On Fri, Aug 14, 2020 at 4:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> After some more rethinking and testing, here's a v5 that feels
>> fairly final to me.  I realized that the logic in canAcceptConnections
>> was kind of backwards: it's better to check the main pmState restrictions
>> first and then the smart-shutdown restrictions afterwards.

> LGTM.  I tested this a bit today and it did what I expected for
> parallel queries and vacuum, on primary and standby.

Thanks for reviewing!  I'll do the back-patching and push this today.

>> I'm assuming we want to back-patch this as far as 9.6, where parallel
>> query began to be a thing.

> Yeah.  I mean, it's more radical than what I thought we'd be doing for
> this, but you could get into real trouble by running in smart shutdown
> mode without the autovac infrastructure alive.

Right.  99.99% of the time, that early shutdown doesn't really cause
any problems, which is how we've gotten away with it this long.  But if
someone did leave session(s) running for a long time after issuing the
SIGTERM, the results could be bad --- and there's no obvious benefit
to the early shutdowns anyway.

            regards, tom lane

Re: Parallel query hangs after a smart shutdown is issued

From

Arseny Sher

Date:

14 August 2020, 15:31:24

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Thomas Munro <thomas.munro@gmail.com> writes:
>> On Fri, Aug 14, 2020 at 4:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> After some more rethinking and testing, here's a v5 that feels
>>> fairly final to me.  I realized that the logic in canAcceptConnections
>>> was kind of backwards: it's better to check the main pmState restrictions
>>> first and then the smart-shutdown restrictions afterwards.
>
>> LGTM.  I tested this a bit today and it did what I expected for
>> parallel queries and vacuum, on primary and standby.
>
> Thanks for reviewing!  I'll do the back-patching and push this today.

FWIW, I've also looked through the patch and it's fine. Moderate testing
also found no issues, check-world works, bgws are started during smart
shutdown as expected. And surely this is better than the inital
shorthack of allowing only parallel workers.

Re: Parallel query hangs after a smart shutdown is issued

From

Tom Lane

Date:

14 August 2020, 17:29:38

Arseny Sher <a.sher@postgrespro.ru> writes:
> FWIW, I've also looked through the patch and it's fine. Moderate testing
> also found no issues, check-world works, bgws are started during smart
> shutdown as expected. And surely this is better than the inital
> shorthack of allowing only parallel workers.

Thanks, appreciate the extra look.  It's pushed now.

            regards, tom lane