Re: Add GoAway protocol message for graceful but fast server shutdown/switchover - Mailing list pgsql-hackers

From Jelte Fennema-Nio
Subject Re: Add GoAway protocol message for graceful but fast server shutdown/switchover
Date
Msg-id DHB2ZT3ZN8L5.21CRG9GA9317G@jeltef.nl
Whole thread Raw
In response to Re: Add GoAway protocol message for graceful but fast server shutdown/switchover  (Tomas Vondra <tomas@vondra.me>)
List pgsql-hackers
On Fri, 20 Mar 2026 at 20:20, Tomas Vondra <tomas@vondra.me> wrote:
> It'd be very helpful if there was some sort of PoC
> support on the pooler/client side, so that I can experiment with it and
> see how helpful the new protocol message is. But I realize that's a bit
> too much to ask for.

I'll see if I can whip something up, it shouldn't be too hard.

> Why not to have a pg_goaway_backend() function, that'd send the
> message to a single backend?

I like this idea a lot. So I added it in the attached v8 patch. This
also allowed we me to add low level tests using the libpq_pipeline
testsuite.

> * In fact, does it improve the smart shutdown case in practice? Let's
> say we have a single instance, and we're restarting it. It'll send
> GoAway to all the clients, the good clients will try to reconnect. But
> if there's even a single "bad" client ignoring the GoAway, all the
> well-behaved clients will get stuck. Ofc, that can happen without the
> GoAway message too - a client may disconnect because of timeout etc. But
> it makes it more likely, and it'll affect the well-behaved clients.

For primary server restarts, I don't think anyone should be using smart
shutdown right now either. Any new connections to the database will be
failing for an indeterminate amount of time. I agree that sending GoAway
might worsen the problem in some cases, but it's already terrible to
start with. Fast shutdown is the only sensible restart mode for a
primary server. This seems to be generally accepted knowledge, given
that we use SIGINT (fast shutdown) in our systemd example[1].

Sending a GoAway on smart shutdown makes that shutdown mode very useful
for read replicas during a planned switch-over to another replica. Now
clients can finish their work and quickly reconnect to the new read
replica, minimizing switchover time while preventing errors.

Even when restarting primary servers, triggering a smart shutdown has a
benefits, as long as it's followed by a fast shutdown after a short
delay (e.g., 1 second). This causes slightly longer downtime (the
additional delay), but it allows most clients to disconnect on their own
terms instead of in the middle of a query. Connection errors can often
be retried transparently more easily than errors in the middle of a
query. In effect, for many applications, this could mean a reduction in
errors and only an increase in latency during a restart.

> * Would it make sense to have some payload in the GoAway message? I'm
> thinking about (a) some deadline by which the client should disconnect,
> e.g. time of planned restart / shutdown, (b) priority, expressing how
> much the client should try to disconnect (and maybe take more drastic
> actions).

I thought some more about this, but ultimately, the payloads you suggest
only seem useful if a client has something inbetween "disconnect hard
now" and "disconnect when the connection is unused". I cannot think of
any such cases. i.e. what other "drastic actions" could a client take
instead of simply closing the connection. If that's the only
possibility, why not simply have the server close the connection in that
case.

Overall, I agree that having no payload in this new message feels a bit
weird. But ultimately, clients don't need any payload to do something
useful.

> Also, two minor comments:

Fixed.

Attachment

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: Skipping schema changes in publication
Next
From: Evgeny Voropaev
Date:
Subject: Re: Compress prune/freeze records with Delta Frame of Reference algorithm