Re: SIGQUIT handling, redux - Mailing list pgsql-hackers

From Tom Lane
Subject Re: SIGQUIT handling, redux
Date
Msg-id 141736.1599693895@sss.pgh.pa.us
Whole thread Raw
In response to Re: SIGQUIT handling, redux  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: SIGQUIT handling, redux  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Here's a draft patch that I think would be reasonable to back-patch.
(Before v13, we'd need a bespoke SIGQUIT handler to substitute for
SignalHandlerForCrashExit, but that's easy enough.)

Aside from comment updates, this

* uses SignalHandlerForCrashExit for SIGQUIT

* renames startup_die per your request

* moves BackendInitialize's interrupt-re-disabling code up a bit to
reduce the scope where these interrupts are active.  This doesn't
make things a whole lot safer, but it can't hurt.

I'll take a closer look at the idea of using _exit(1) tomorrow,
but I'd be pretty hesitant to back-patch that.

            regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 42223c0f61..e65086f7b4 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -112,6 +112,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/fork_process.h"
+#include "postmaster/interrupt.h"
 #include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
@@ -405,7 +406,7 @@ static void SIGHUP_handler(SIGNAL_ARGS);
 static void pmdie(SIGNAL_ARGS);
 static void reaper(SIGNAL_ARGS);
 static void sigusr1_handler(SIGNAL_ARGS);
-static void startup_die(SIGNAL_ARGS);
+static void startup_packet_die(SIGNAL_ARGS);
 static void dummy_handler(SIGNAL_ARGS);
 static void StartupPacketTimeoutHandler(void);
 static void CleanupBackend(int pid, int exitstatus);
@@ -4340,22 +4341,30 @@ BackendInitialize(Port *port)
     whereToSendOutput = DestRemote; /* now safe to ereport to client */

     /*
-     * We arrange for a simple exit(1) if we receive SIGTERM or SIGQUIT or
-     * timeout while trying to collect the startup packet.  Otherwise the
-     * postmaster cannot shutdown the database FAST or IMMED cleanly if a
-     * buggy client fails to send the packet promptly.  XXX it follows that
-     * the remainder of this function must tolerate losing control at any
-     * instant.  Likewise, any pg_on_exit_callback registered before or during
-     * this function must be prepared to execute at any instant between here
-     * and the end of this function.  Furthermore, affected callbacks execute
-     * partially or not at all when a second exit-inducing signal arrives
-     * after proc_exit_prepare() decrements on_proc_exit_index.  (Thanks to
-     * that mechanic, callbacks need not anticipate more than one call.)  This
-     * is fragile; it ought to instead follow the norm of handling interrupts
-     * at selected, safe opportunities.
-     */
-    pqsignal(SIGTERM, startup_die);
-    pqsignal(SIGQUIT, startup_die);
+     * We arrange to do proc_exit(1) if we receive SIGTERM or timeout while
+     * trying to collect the startup packet; while SIGQUIT results in
+     * _exit(2).  Otherwise the postmaster cannot shutdown the database FAST
+     * or IMMED cleanly if a buggy client fails to send the packet promptly.
+     *
+     * XXX this is pretty dangerous; signal handlers should not call anything
+     * as complex as proc_exit() directly.  We minimize the hazard by not
+     * keeping these handlers active for longer than we must.  However, it
+     * seems necessary to be able to escape out of DNS lookups as well as the
+     * startup packet reception proper, so we can't narrow the scope further
+     * than is done here.
+     *
+     * XXX it follows that the remainder of this function must tolerate losing
+     * control at any instant.  Likewise, any pg_on_exit_callback registered
+     * before or during this function must be prepared to execute at any
+     * instant between here and the end of this function.  Furthermore,
+     * affected callbacks execute partially or not at all when a second
+     * exit-inducing signal arrives after proc_exit_prepare() decrements
+     * on_proc_exit_index.  (Thanks to that mechanic, callbacks need not
+     * anticipate more than one call.)  This is fragile; it ought to instead
+     * follow the norm of handling interrupts at selected, safe opportunities.
+     */
+    pqsignal(SIGTERM, startup_packet_die);
+    pqsignal(SIGQUIT, SignalHandlerForCrashExit);
     InitializeTimeouts();        /* establishes SIGALRM handler */
     PG_SETMASK(&StartupBlockSig);

@@ -4411,8 +4420,8 @@ BackendInitialize(Port *port)
         port->remote_hostname = strdup(remote_host);

     /*
-     * Ready to begin client interaction.  We will give up and exit(1) after a
-     * time delay, so that a broken client can't hog a connection
+     * Ready to begin client interaction.  We will give up and proc_exit(1)
+     * after a time delay, so that a broken client can't hog a connection
      * indefinitely.  PreAuthDelay and any DNS interactions above don't count
      * against the time limit.
      *
@@ -4434,6 +4443,12 @@ BackendInitialize(Port *port)
      */
     status = ProcessStartupPacket(port, false, false);

+    /*
+     * Disable the timeout, and prevent SIGTERM again.
+     */
+    disable_timeout(STARTUP_PACKET_TIMEOUT, false);
+    PG_SETMASK(&BlockSig);
+
     /*
      * Stop here if it was bad or a cancel packet.  ProcessStartupPacket
      * already did any appropriate error reporting.
@@ -4459,12 +4474,6 @@ BackendInitialize(Port *port)
     pfree(ps_data.data);

     set_ps_display("initializing");
-
-    /*
-     * Disable the timeout, and prevent SIGTERM/SIGQUIT again.
-     */
-    disable_timeout(STARTUP_PACKET_TIMEOUT, false);
-    PG_SETMASK(&BlockSig);
 }


@@ -5359,16 +5368,22 @@ sigusr1_handler(SIGNAL_ARGS)
 }

 /*
- * SIGTERM or SIGQUIT while processing startup packet.
+ * SIGTERM while processing startup packet.
  * Clean up and exit(1).
  *
- * XXX: possible future improvement: try to send a message indicating
- * why we are disconnecting.  Problem is to be sure we don't block while
- * doing so, nor mess up SSL initialization.  In practice, if the client
- * has wedged here, it probably couldn't do anything with the message anyway.
+ * Running proc_exit() from a signal handler is pretty unsafe, since we
+ * can't know what code we've interrupted.  But the alternative of using
+ * _exit(2) is also unpalatable, since it'd mean that a "fast shutdown"
+ * would cause a database crash cycle (forcing WAL replay at restart)
+ * if any sessions are in authentication.  So we live with it for now.
+ *
+ * One might be tempted to try to send a message indicating why we are
+ * disconnecting.  However, that would make this even more unsafe.  Also,
+ * it seems undesirable to provide clues about the database's state to
+ * a client that has not yet completed authentication.
  */
 static void
-startup_die(SIGNAL_ARGS)
+startup_packet_die(SIGNAL_ARGS)
 {
     proc_exit(1);
 }
@@ -5389,7 +5404,11 @@ dummy_handler(SIGNAL_ARGS)

 /*
  * Timeout while processing startup packet.
- * As for startup_die(), we clean up and exit(1).
+ * As for startup_packet_die(), we clean up and exit(1).
+ *
+ * This is theoretically just as hazardous as it is in startup_packet_die(),
+ * although in practice we're almost certainly waiting for client input,
+ * which greatly reduces the risk.
  */
 static void
 StartupPacketTimeoutHandler(void)

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Optimising compactify_tuples()
Next
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: Transactions involving multiple postgres foreign servers, take 2