Thread: PostgreSQL shutdown modes

PostgreSQL shutdown modes

From
Robert Haas
Date:
Hi,

I think it's pretty evident that the names we've chosen for the
various PostgreSQL shutdown modes are pretty terrible, and maybe we
should try to do something about that. There is nothing "smart" about
a smart shutdown. The usual result of attempting a smart shutdown is
that the server never shuts down at all, because typically there are
going to be some applications using connections that are kept open
more or less permanently. What ends up happening when you attempt a
"smart" shutdown is that you've basically put the server into a mode
where you're irreversibly committed to accepting no new connections,
but because you have a connection pooler or something that keeps
connections open forever, you never shut down either. It is in effect
a denial-of-service attack on the database you're supposed to be
administering.

Similarly, "fast" shutdowns are not in any way fast. It is pretty
common for a fast shutdown to take many minutes or even tens of
minutes to complete. This doesn't require some kind of extreme
workload to hit; I've run into it during casual benchmarking runs.
It's very easy to have enough dirty data in shared buffers, or enough
dirty in the operating system cache that will have to be fsync'd in
order to complete the shutdown checkpoint, to make things take an
extremely long time. In some ways, this is an even more effective
denial-of-service attack than a smart shutdown. True, the database
will at some point actually finish shutting down, but in the meantime
not only will we not accept new connections but we'll evict all of the
existing ones. Good luck maintaining five nines of availability if
waiting for a clean shutdown to complete is any part of the process.
It might be smarter to initiate a regular (non-shutdown) checkpoint
first, without cutting off connections, and then when that finishes,
proceed as we do now. The second checkpoint will complete a lot
faster, so while the overall operation still won't be fast, at least
we'd be refusing connections for a shorter period of time before the
system is actually shut down and you can do whatever maintenance you
need to do.

"immediate" shutdowns aren't as bad as the other two, but they're
still bad. One of the big problems is that I encounter in this area is
that Oracle uses the name "immediate" shutdown to mean a normal
shutdown with a checkpoint allowing for a clean restart. Users coming
from Oracle are sometimes extremely surprised to discover that an
immediate shutdown is actually a server crash that will require
recovery. Even if you don't come from Oracle, there's really nothing
about the name of this shutdown mode that intrinsically makes you
understand that it's something you should do only as a last resort.
Who doesn't like things that are immediate? The problem with this
theory is that you make the shutdown quicker at the price of startup
becoming much, much slower, because the crash recovery is very likely
going to take a whole lot longer than the shutdown checkpoint would
have done.

I attach herewith a modest patch to rename these shutdown modes to
more accurately correspond to their actual characteristics.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

Re: PostgreSQL shutdown modes

From
Justin Pryzby
Date:
Isn't this missing support in pg_dumb ?



Re: PostgreSQL shutdown modes

From
Michael Paquier
Date:
On Fri, Apr 01, 2022 at 01:22:05PM -0400, Robert Haas wrote:
> I attach herewith a modest patch to rename these shutdown modes to
> more accurately correspond to their actual characteristics.
>
> Date: Fri, 1 Apr 2022 12:50:05 -0400

I love the idea.  Just in time, before the feature freeze deadline.
--
Michael

Attachment

Re: PostgreSQL shutdown modes

From
Rushabh Lathia
Date:

+1 for the idea of changing the name, as it's really confusing.

I had quick check in the patch and noticed below replacements:

-#define SmartShutdown 1
-#define FastShutdown 2
-#define ImmediateShutdown 3
+#define DumbShutdown 1
+#define SlowShutdown 2
+#define CrappyShutdown 3

About the new naming,  if "Crappy" can be replaced with something else. But
was not able to come up with any proper suggestions here.  Or may be
"Immediate" is appropriate, as here it's talking about a "Shutdown" operation.



On Sat, Apr 2, 2022 at 8:29 AM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Apr 01, 2022 at 01:22:05PM -0400, Robert Haas wrote:
> I attach herewith a modest patch to rename these shutdown modes to
> more accurately correspond to their actual characteristics.
>
> Date: Fri, 1 Apr 2022 12:50:05 -0400

I love the idea.  Just in time, before the feature freeze deadline.
--
Michael


--
Rushabh Lathia

Re: PostgreSQL shutdown modes

From
chap@anastigmatix.net
Date:
On 2022-04-01 13:22, Robert Haas wrote:
> I attach herewith a modest patch to rename these shutdown modes to
> more accurately correspond to their actual characteristics.

I've waited for April 2nd to submit this comment, but it seemed to me 
that the
suggestion about the first-pass checkpoint in 'slow' mode is a 
no-foolin' good one.
Then I wondered whether there could be an option to accompany the 'dumb' 
mode that
would take a WHERE clause, to be implicitly applied to pg_stat_activity, 
whose
purpose would be to select those sessions that are ok to evict without 
waiting for
them to exit. It could recognize, say, backend connections in no current 
transaction
that are from your pesky app or connection pooler that holds things 
open. It could
also, for example, select things in transaction state but where
  current_timestamp - state_change > '5 minutes' (so it would be 
re-evaluated every
so often until ready to shut down).

For conciseness (and sanity), maybe the WHERE clause could be implicitly 
applied,
not to pg_stat_activity directly, but to a (virtual or actual) view that 
has
already been restricted to client backend sessions, and already has a 
column
for current_timestamp - state_change.

Regards,
-Chap



Re: PostgreSQL shutdown modes

From
Kyotaro Horiguchi
Date:
At Sat, 2 Apr 2022 11:58:55 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Fri, Apr 01, 2022 at 01:22:05PM -0400, Robert Haas wrote:
> > I attach herewith a modest patch to rename these shutdown modes to
> > more accurately correspond to their actual characteristics.
> >
> > Date: Fri, 1 Apr 2022 12:50:05 -0400
> 
> I love the idea.  Just in time, before the feature freeze deadline.

FWIW, this came in to my mailbox with at "4/2 2:22 JST":p

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: PostgreSQL shutdown modes

From
Robert Haas
Date:
On Sat, Apr 2, 2022 at 9:39 AM <chap@anastigmatix.net> wrote:
> I've waited for April 2nd to submit this comment, but it seemed to me
> that the
> suggestion about the first-pass checkpoint in 'slow' mode is a
> no-foolin' good one.

Yeah. While the patch itself is mostly in jest, everything I wrote in
the email is unfortunately pretty much 100% accurate, no fooling. I
think it would be worth doing a number of things:

- Provide some way of backing out of smart shutdown mode.
- Provide some way of making a smart shutdown turn into a fast
shutdown after a configurable period of time.
- Do a preparatory checkpoint before the real shutdown checkpoint
especially in fast mode, but maybe also in smart mode. Maybe there's
some even smarter thing we could be doing here, not sure what exactly.
- Consider renaming "immediate" mode, maybe to "crash" or something.
Oracle uses "abort".

> Then I wondered whether there could be an option to accompany the 'dumb'
> mode that
> would take a WHERE clause, to be implicitly applied to pg_stat_activity,
> whose
> purpose would be to select those sessions that are ok to evict without
> waiting for
> them to exit. It could recognize, say, backend connections in no current
> transaction
> that are from your pesky app or connection pooler that holds things
> open. It could
> also, for example, select things in transaction state but where
>   current_timestamp - state_change > '5 minutes' (so it would be
> re-evaluated every
> so often until ready to shut down).

Seems like this might be better done in user-space than hard-coded
into the server behavior.

-- 
Robert Haas
EDB: http://www.enterprisedb.com