Add GoAway protocol message for graceful but fast server shutdown/switchover - Mailing list pgsql-hackers

From Jelte Fennema-Nio
Subject Add GoAway protocol message for graceful but fast server shutdown/switchover
Date
Msg-id DDPQ1RV5FE9U.I2WW34NGRD8Z@jeltef.nl
Whole thread Raw
List pgsql-hackers
This change introduces a new GoAway backend-to-frontend protocol
message (byte 'g') that the server can send to the client to politely
request that client to disconnect/reconnect when convenient. This message is
advisory only - the connection remains fully functional and clients may
continue executing queries and starting new transactions. "When
convenient" is obviously not very well defined, but the primary target
clients are clients that maintain a connection pool. Such clients should
disconnect/reconnect a connection in the pool when there's no user of
that connection. This is similar to how such clients often currently
remove a connection from the pool after the connection hits a maximum
lifetime of e.g. 1 hour.

This new message is used by Postgres during the already existing "smart"
shutdown procedure (i.e. when postmaster receives SIGTERM). When
Postgres is in "smart" shutdown mode existing clients can continue to
run queries as usual but new connection attempts are rejected. This mode
is primarily useful when triggering a switchover of a read replica. A
load balancer can route new connections only to the new read replica,
while the old load balancer keeps serving the existing connections until
they disconnect. The problem is that this draining of connections could
often take a long time. Even when clients only run very short
queries/transactions because the session can be kept open much longer
(many connection pools use 1 hour max lifetime of a connection by default).
With the introduction of the GoAway message Postgres now sends this
message to all connected clients when it enters smart shutdown mode.
If these clients respond to the message by reconnecting/disconnecting
earlier than their maximum connection lifetime the draining can complete
much quicker. Similar benefits to switchover duration can be achieved
for other applications or proxies implementing the Postgres protocol,
like when switching over a cluster of PgBouncer machines to a newer
version.

Applications/clients that use libpq can periodically check the result of
the new PQgoAwayReceived() function to find out whether they have been
asked to reconnect.

Attachment

pgsql-hackers by date:

Previous
From: Xuneng Zhou
Date:
Subject: Re: Implement waiting for wal lsn replay: reloaded
Next
From: Andrey Borodin
Date:
Subject: Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions