Re: Parallel query hangs after a smart shutdown is issued - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Parallel query hangs after a smart shutdown is issued
Date
Msg-id 272331.1597255255@sss.pgh.pa.us
Whole thread Raw
In response to Re: Parallel query hangs after a smart shutdown is issued  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Parallel query hangs after a smart shutdown is issued
List pgsql-hackers
Thomas Munro <thomas.munro@gmail.com> writes:
> On Thu, Aug 13, 2020 at 3:32 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>> After a smart shutdown is issued(with pg_ctl), run a parallel query,
>> then the query hangs.

> Yeah, the current situation is not good.  I think your option 2 sounds
> better, because the documented behaviour of smart shutdown is that it
> "lets existing sessions end their work normally".  I think that means
> that a query that is already running or allowed to start should be
> able to start new workers and not have its existing workers
> terminated.  Arseny Sher wrote a couple of different patches to try
> that last year, but they fell through the cracks:
> https://www.postgresql.org/message-id/flat/CA%2BhUKGLrJij0BuFtHsMHT4QnLP54Z3S6vGVBCWR8A49%2BNzctCw%40mail.gmail.com

I already commented on this in the other thread that Bharath started [1].
I think the real issue here is why is the postmaster's SIGTERM handler
doing *anything* other than disallowing new connections?  It seems quite
premature to kill support processes of any sort, not only parallel
workers.  The documentation says existing clients are allowed to end
their work, not that their performance is going to be crippled until
they end.

So I looked at moving the kills of all the support processes to happen
after we detect that there are no remaining regular backends, and it
seems to not be too hard.  I realized that the existing PM_WAIT_READONLY
state is doing that already, but just for a subset of support processes
that it thinks might be active in hot standby mode.  So what I did in the
attached was to repurpose that state as "PM_WAIT_CLIENTS", which does the
right thing in either regular or hot standby mode.

One other thing I changed here was to remove PM_WAIT_READONLY from the
set of states in which we'll allow promotion to occur or a new walreceiver
to start.  I'm not convinced that either of those behaviors aren't
bugs; although if someone thinks they're right, we can certainly put
back PM_WAIT_CLIENTS in those checks.  (But, for example, it does not
appear possible to reach PM_WAIT_READONLY/PM_WAIT_CLIENTS state with
Shutdown == NoShutdown, so the test in MaybeStartWalReceiver sure looks
like confusingly dead code to me.  If we do want to allow restarting
the walreceiver in this state, the Shutdown condition needs fixed.)

            regards, tom lane

[1] https://www.postgresql.org/message-id/65189.1597181322%40sss.pgh.pa.us

diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index e31275a04e..3946fa52ea 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -185,8 +185,8 @@ PostgreSQL documentation
    <option>stop</option> mode shuts down the server that is running in
    the specified data directory.  Three different
    shutdown methods can be selected with the <option>-m</option>
-   option.  <quote>Smart</quote> mode waits for all active
-   clients to disconnect and any online backup to finish.
+   option.  <quote>Smart</quote> mode disallows new connections, then waits
+   for all existing clients to disconnect and any online backup to finish.
    If the server is in hot standby, recovery and streaming replication
    will be terminated once all clients have disconnected.
    <quote>Fast</quote> mode (the default) does not wait for clients to disconnect and
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 38e2c16ac2..790948a4b2 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -148,8 +148,6 @@
 #define BACKEND_TYPE_BGWORKER    0x0008    /* bgworker process */
 #define BACKEND_TYPE_ALL        0x000F    /* OR of all the above */

-#define BACKEND_TYPE_WORKER        (BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER)
-
 /*
  * List of active backends (or child processes anyway; we don't actually
  * know whether a given child has become a backend or is still in the
@@ -332,8 +330,8 @@ typedef enum
     PM_HOT_STANDBY,                /* in hot standby mode */
     PM_RUN,                        /* normal "database is alive" state */
     PM_WAIT_BACKUP,                /* waiting for online backup mode to end */
-    PM_WAIT_READONLY,            /* waiting for read only backends to exit */
-    PM_WAIT_BACKENDS,            /* waiting for live backends to exit */
+    PM_WAIT_CLIENTS,            /* waiting for normal backends to exit */
+    PM_WAIT_BACKENDS,            /* waiting for all backends to exit */
     PM_SHUTDOWN,                /* waiting for checkpointer to do shutdown
                                  * ckpt */
     PM_SHUTDOWN_2,                /* waiting for archiver and walsenders to
@@ -2793,35 +2791,19 @@ pmdie(SIGNAL_ARGS)
             sd_notify(0, "STOPPING=1");
 #endif

-            if (pmState == PM_RUN || pmState == PM_RECOVERY ||
-                pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
-            {
-                /* autovac workers are told to shut down immediately */
-                /* and bgworkers too; does this need tweaking? */
-                SignalSomeChildren(SIGTERM,
-                                   BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER);
-                /* and the autovac launcher too */
-                if (AutoVacPID != 0)
-                    signal_child(AutoVacPID, SIGTERM);
-                /* and the bgwriter too */
-                if (BgWriterPID != 0)
-                    signal_child(BgWriterPID, SIGTERM);
-                /* and the walwriter too */
-                if (WalWriterPID != 0)
-                    signal_child(WalWriterPID, SIGTERM);
-
-                /*
-                 * If we're in recovery, we can't kill the startup process
-                 * right away, because at present doing so does not release
-                 * its locks.  We might want to change this in a future
-                 * release.  For the time being, the PM_WAIT_READONLY state
-                 * indicates that we're waiting for the regular (read only)
-                 * backends to die off; once they do, we'll kill the startup
-                 * and walreceiver processes.
-                 */
-                pmState = (pmState == PM_RUN) ?
-                    PM_WAIT_BACKUP : PM_WAIT_READONLY;
-            }
+            /*
+             * If we reached normal running, we have to wait for any online
+             * backup mode to end; otherwise go straight to waiting for client
+             * backends to exit.  (The difference is that in the former state,
+             * we'll still let in new superuser clients, so that somebody can
+             * end the online backup mode.)  If already in PM_WAIT_BACKUP or a
+             * later state, do not change it.
+             */
+            if (pmState == PM_RUN)
+                pmState = PM_WAIT_BACKUP;
+            else if (pmState == PM_RECOVERY ||
+                     pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
+                pmState = PM_WAIT_CLIENTS;

             /*
              * Now wait for online backup mode to end and backends to exit. If
@@ -2871,16 +2853,15 @@ pmdie(SIGNAL_ARGS)
             }
             else if (pmState == PM_RUN ||
                      pmState == PM_WAIT_BACKUP ||
-                     pmState == PM_WAIT_READONLY ||
+                     pmState == PM_WAIT_CLIENTS ||
                      pmState == PM_WAIT_BACKENDS ||
                      pmState == PM_HOT_STANDBY)
             {
                 ereport(LOG,
                         (errmsg("aborting any active transactions")));
-                /* shut down all backends and workers */
+                /* shut down all backends and workers, but not walsenders */
                 SignalSomeChildren(SIGTERM,
-                                   BACKEND_TYPE_NORMAL | BACKEND_TYPE_AUTOVAC |
-                                   BACKEND_TYPE_BGWORKER);
+                                   BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND);
                 /* and the autovac launcher too */
                 if (AutoVacPID != 0)
                     signal_child(AutoVacPID, SIGTERM);
@@ -3713,7 +3694,7 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
         pmState == PM_HOT_STANDBY ||
         pmState == PM_RUN ||
         pmState == PM_WAIT_BACKUP ||
-        pmState == PM_WAIT_READONLY ||
+        pmState == PM_WAIT_CLIENTS ||
         pmState == PM_SHUTDOWN)
         pmState = PM_WAIT_BACKENDS;

@@ -3802,21 +3783,35 @@ PostmasterStateMachine(void)
          * PM_WAIT_BACKUP state ends when online backup mode is not active.
          */
         if (!BackupInProgress())
-            pmState = PM_WAIT_BACKENDS;
+            pmState = PM_WAIT_CLIENTS;
     }

-    if (pmState == PM_WAIT_READONLY)
+    if (pmState == PM_WAIT_CLIENTS)
     {
         /*
-         * PM_WAIT_READONLY state ends when we have no regular backends that
-         * have been started during recovery.  We kill the startup and
-         * walreceiver processes and transition to PM_WAIT_BACKENDS.  Ideally,
-         * we might like to kill these processes first and then wait for
-         * backends to die off, but that doesn't work at present because
-         * killing the startup process doesn't release its locks.
+         * PM_WAIT_CLIENTS state ends when we have no normal client backends
+         * running.  Then signal appropriate support processes, and transition
+         * to PM_WAIT_BACKENDS to wait for them to die.
          */
         if (CountChildren(BACKEND_TYPE_NORMAL) == 0)
         {
+            /*
+             * Signal all backend children except walsenders.  (While there
+             * can't be any normal children left, we might as well include
+             * BACKEND_TYPE_NORMAL in this mask, just to be sure.)
+             */
+            SignalSomeChildren(SIGTERM,
+                               BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND);
+            /* and the autovac launcher too */
+            if (AutoVacPID != 0)
+                signal_child(AutoVacPID, SIGTERM);
+            /* and the bgwriter too */
+            if (BgWriterPID != 0)
+                signal_child(BgWriterPID, SIGTERM);
+            /* and the walwriter too */
+            if (WalWriterPID != 0)
+                signal_child(WalWriterPID, SIGTERM);
+            /* If we're in recovery, also stop startup and walreceiver procs */
             if (StartupPID != 0)
                 signal_child(StartupPID, SIGTERM);
             if (WalReceiverPID != 0)
@@ -3843,7 +3838,7 @@ PostmasterStateMachine(void)
          * later after writing the checkpoint record, like the archiver
          * process.
          */
-        if (CountChildren(BACKEND_TYPE_NORMAL | BACKEND_TYPE_WORKER) == 0 &&
+        if (CountChildren(BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND) == 0 &&
             StartupPID == 0 &&
             WalReceiverPID == 0 &&
             BgWriterPID == 0 &&
@@ -5333,7 +5328,7 @@ sigusr1_handler(SIGNAL_ARGS)

     if (StartupPID != 0 &&
         (pmState == PM_STARTUP || pmState == PM_RECOVERY ||
-         pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
+         pmState == PM_HOT_STANDBY) &&
         CheckPromoteSignal())
     {
         /*
@@ -5651,7 +5646,7 @@ MaybeStartWalReceiver(void)
 {
     if (WalReceiverPID == 0 &&
         (pmState == PM_STARTUP || pmState == PM_RECOVERY ||
-         pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
+         pmState == PM_HOT_STANDBY) &&
         Shutdown == NoShutdown)
     {
         WalReceiverPID = StartWalReceiver();
@@ -5905,7 +5900,7 @@ bgworker_should_start_now(BgWorkerStartTime start_time)
         case PM_SHUTDOWN_2:
         case PM_SHUTDOWN:
         case PM_WAIT_BACKENDS:
-        case PM_WAIT_READONLY:
+        case PM_WAIT_CLIENTS:
         case PM_WAIT_BACKUP:
             break;


pgsql-hackers by date:

Previous
From: Jaime Casanova
Date:
Subject: Re: EDB builds Postgres 13 with an obsolete ICU version
Next
From: Alvaro Herrera
Date:
Subject: Re: [BUG] Error in BRIN summarization