Thread: PostgreSQL shutdown modes
Hi, I think it's pretty evident that the names we've chosen for the various PostgreSQL shutdown modes are pretty terrible, and maybe we should try to do something about that. There is nothing "smart" about a smart shutdown. The usual result of attempting a smart shutdown is that the server never shuts down at all, because typically there are going to be some applications using connections that are kept open more or less permanently. What ends up happening when you attempt a "smart" shutdown is that you've basically put the server into a mode where you're irreversibly committed to accepting no new connections, but because you have a connection pooler or something that keeps connections open forever, you never shut down either. It is in effect a denial-of-service attack on the database you're supposed to be administering. Similarly, "fast" shutdowns are not in any way fast. It is pretty common for a fast shutdown to take many minutes or even tens of minutes to complete. This doesn't require some kind of extreme workload to hit; I've run into it during casual benchmarking runs. It's very easy to have enough dirty data in shared buffers, or enough dirty in the operating system cache that will have to be fsync'd in order to complete the shutdown checkpoint, to make things take an extremely long time. In some ways, this is an even more effective denial-of-service attack than a smart shutdown. True, the database will at some point actually finish shutting down, but in the meantime not only will we not accept new connections but we'll evict all of the existing ones. Good luck maintaining five nines of availability if waiting for a clean shutdown to complete is any part of the process. It might be smarter to initiate a regular (non-shutdown) checkpoint first, without cutting off connections, and then when that finishes, proceed as we do now. The second checkpoint will complete a lot faster, so while the overall operation still won't be fast, at least we'd be refusing connections for a shorter period of time before the system is actually shut down and you can do whatever maintenance you need to do. "immediate" shutdowns aren't as bad as the other two, but they're still bad. One of the big problems is that I encounter in this area is that Oracle uses the name "immediate" shutdown to mean a normal shutdown with a checkpoint allowing for a clean restart. Users coming from Oracle are sometimes extremely surprised to discover that an immediate shutdown is actually a server crash that will require recovery. Even if you don't come from Oracle, there's really nothing about the name of this shutdown mode that intrinsically makes you understand that it's something you should do only as a last resort. Who doesn't like things that are immediate? The problem with this theory is that you make the shutdown quicker at the price of startup becoming much, much slower, because the crash recovery is very likely going to take a whole lot longer than the shutdown checkpoint would have done. I attach herewith a modest patch to rename these shutdown modes to more accurately correspond to their actual characteristics. -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
Isn't this missing support in pg_dumb ?
On Fri, Apr 01, 2022 at 01:22:05PM -0400, Robert Haas wrote: > I attach herewith a modest patch to rename these shutdown modes to > more accurately correspond to their actual characteristics. > > Date: Fri, 1 Apr 2022 12:50:05 -0400 I love the idea. Just in time, before the feature freeze deadline. -- Michael
Attachment
I had quick check in the patch and noticed below replacements:
-#define SmartShutdown 1
-#define FastShutdown 2
-#define ImmediateShutdown 3
+#define DumbShutdown 1
+#define SlowShutdown 2
+#define CrappyShutdown 3
-#define ImmediateShutdown 3
+#define DumbShutdown 1
+#define SlowShutdown 2
+#define CrappyShutdown 3
About the new naming, if "Crappy" can be replaced with something else. But
was not able to come up with any proper suggestions here. Or may be
"Immediate" is appropriate, as here it's talking about a "Shutdown" operation.
On Sat, Apr 2, 2022 at 8:29 AM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Apr 01, 2022 at 01:22:05PM -0400, Robert Haas wrote:
> I attach herewith a modest patch to rename these shutdown modes to
> more accurately correspond to their actual characteristics.
>
> Date: Fri, 1 Apr 2022 12:50:05 -0400
I love the idea. Just in time, before the feature freeze deadline.
--
Michael
Rushabh Lathia
On 2022-04-01 13:22, Robert Haas wrote: > I attach herewith a modest patch to rename these shutdown modes to > more accurately correspond to their actual characteristics. I've waited for April 2nd to submit this comment, but it seemed to me that the suggestion about the first-pass checkpoint in 'slow' mode is a no-foolin' good one. Then I wondered whether there could be an option to accompany the 'dumb' mode that would take a WHERE clause, to be implicitly applied to pg_stat_activity, whose purpose would be to select those sessions that are ok to evict without waiting for them to exit. It could recognize, say, backend connections in no current transaction that are from your pesky app or connection pooler that holds things open. It could also, for example, select things in transaction state but where current_timestamp - state_change > '5 minutes' (so it would be re-evaluated every so often until ready to shut down). For conciseness (and sanity), maybe the WHERE clause could be implicitly applied, not to pg_stat_activity directly, but to a (virtual or actual) view that has already been restricted to client backend sessions, and already has a column for current_timestamp - state_change. Regards, -Chap
At Sat, 2 Apr 2022 11:58:55 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Fri, Apr 01, 2022 at 01:22:05PM -0400, Robert Haas wrote: > > I attach herewith a modest patch to rename these shutdown modes to > > more accurately correspond to their actual characteristics. > > > > Date: Fri, 1 Apr 2022 12:50:05 -0400 > > I love the idea. Just in time, before the feature freeze deadline. FWIW, this came in to my mailbox with at "4/2 2:22 JST":p regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Sat, Apr 2, 2022 at 9:39 AM <chap@anastigmatix.net> wrote: > I've waited for April 2nd to submit this comment, but it seemed to me > that the > suggestion about the first-pass checkpoint in 'slow' mode is a > no-foolin' good one. Yeah. While the patch itself is mostly in jest, everything I wrote in the email is unfortunately pretty much 100% accurate, no fooling. I think it would be worth doing a number of things: - Provide some way of backing out of smart shutdown mode. - Provide some way of making a smart shutdown turn into a fast shutdown after a configurable period of time. - Do a preparatory checkpoint before the real shutdown checkpoint especially in fast mode, but maybe also in smart mode. Maybe there's some even smarter thing we could be doing here, not sure what exactly. - Consider renaming "immediate" mode, maybe to "crash" or something. Oracle uses "abort". > Then I wondered whether there could be an option to accompany the 'dumb' > mode that > would take a WHERE clause, to be implicitly applied to pg_stat_activity, > whose > purpose would be to select those sessions that are ok to evict without > waiting for > them to exit. It could recognize, say, backend connections in no current > transaction > that are from your pesky app or connection pooler that holds things > open. It could > also, for example, select things in transaction state but where > current_timestamp - state_change > '5 minutes' (so it would be > re-evaluated every > so often until ready to shut down). Seems like this might be better done in user-space than hard-coded into the server behavior. -- Robert Haas EDB: http://www.enterprisedb.com