pgsql: Fix postmaster's behavior during smart shutdown. - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Fix postmaster's behavior during smart shutdown.
Date
Msg-id E1k6dVo-0007s8-Jc@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Fix postmaster's behavior during smart shutdown.

Up to now, upon receipt of a SIGTERM ("smart shutdown" command), the
postmaster has immediately killed all "optional" background processes,
and subsequently refused to launch new ones while it's waiting for
foreground client processes to exit.  No doubt this seemed like an OK
policy at some point; but it's a pretty bad one now, because it makes
for a seriously degraded environment for the remaining clients:

* Parallel queries are killed, and new ones fail to launch. (And our
parallel-query infrastructure utterly fails to deal with the case
in a reasonable way --- it just hangs waiting for workers that are
not going to arrive.  There is more work needed in that area IMO.)

* Autovacuum ceases to function.  We can tolerate that for awhile,
but if bulk-update queries continue to run in the surviving client
sessions, there's eventually going to be a mess.  In the worst case
the system could reach a forced shutdown to prevent XID wraparound.

* The bgwriter and walwriter are also stopped immediately, likely
resulting in performance degradation.

Hence, let's rearrange things so that the only immediate change in
behavior is refusing to let in new normal connections.  Once the last
normal connection is gone, shut everything down as though we'd received
a "fast" shutdown.  To implement this, remove the PM_WAIT_BACKUP and
PM_WAIT_READONLY states, instead staying in PM_RUN or PM_HOT_STANDBY
while normal connections remain.  A subsidiary state variable tracks
whether or not we're letting in new connections in those states.

This also allows having just one copy of the logic for killing child
processes in smart and fast shutdown modes.  I moved that logic into
PostmasterStateMachine() by inventing a new state PM_STOP_BACKENDS.

Back-patch to 9.6 where parallel query was added.  In principle
this'd be a good idea in 9.5 as well, but the risk/reward ratio
is not as good there, since lack of autovacuum is not a problem
during typical uses of smart shutdown.

Per report from Bharath Rupireddy.

Patch by me, reviewed by Thomas Munro

Discussion: https://postgr.es/m/CALj2ACXAZ5vKxT9P7P89D87i3MDO9bfS+_bjMHgnWJs8uwUOOw@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/0038f943878286ce84b2dfac10d64e00eab02edd

Modified Files
--------------
doc/src/sgml/ref/pg_ctl-ref.sgml    |   4 +-
src/backend/postmaster/postmaster.c | 239 ++++++++++++++++++------------------
src/backend/utils/init/postinit.c   |   2 +-
src/include/libpq/libpq-be.h        |   2 +-
4 files changed, 126 insertions(+), 121 deletions(-)


pgsql-committers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: pgsql: Fix typo in test comment.
Next
From: Peter Geoghegan
Date:
Subject: pgsql: Fix obsolete comment in xlogutils.c.