SIGQUIT handling, redux - Mailing list pgsql-hackers

From Tom Lane
Subject SIGQUIT handling, redux
Date
Msg-id 1850884.1599601164@sss.pgh.pa.us
Whole thread Raw
Responses Re: SIGQUIT handling, redux  (Stephen Frost <sfrost@snowman.net>)
Re: SIGQUIT handling, redux  (Andres Freund <andres@anarazel.de>)
Re: SIGQUIT handling, redux  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
This is to pull together a couple of recent threads that are seemingly
unrelated to $SUBJECT.

Over in the long thread at [1] we've agreed to try to make the backend
code actually do what assorted comments claim it does, namely allow
SIGQUIT to be accepted at any point after a given process has established
its signal handlers.  (Right now, it mostly is not accepted during error
recovery, but there's no clear reason to preserve that exception.)

Of course, this is only safe if the SIGQUIT handler is safe to be invoked
anywhere, so I did a quick survey of backend signal handlers to see if
that is true.  I immediately found pgarch_exit() which surely is not.  It
turns out that Horiguchi-san already noticed that and proposed to fix it,
within the seemingly entirely unrelated patch series for a shared-memory
based stats collector (see patch 0001 in [2]).  I think we should go ahead
and commit that, and maybe even back-patch it.  There seems no good reason
for the archiver to treat SIGQUIT as nonfatal when other postmaster
children do not.

Another thing I found is startup_die() in postmaster.c, which catches
SIGQUIT during the authentication phase of a new backend.  This calls
proc_exit(1), which (a) is unsafe on its face, and (b) potentially
demotes a SIGQUIT into something that will not cause the parent postmaster
to perform a DB-wide restart.  There seems no good reason for that
behavior, either.  I propose that we use SignalHandlerForCrashExit
for SIGQUIT here too.

In passing, it's worth noting that startup_die() isn't really much safer
for SIGTERM than it is for SIGQUIT; the only argument for distinguishing
those is that code that applies BlockSig will at least manage to block the
former.  So it's slightly tempting to drop startup_die() altogether in
favor of using SignalHandlerForCrashExit for the SIGTERM-during-auth case
too.  However, that'd have the effect of causing a "fast shutdown" to get
converted to a panic if it happens while any sessions are in
authentication phase, so I think this cure is likely worse than the
disease.

Worth reading in connection with this is the thread that led up to
commits 8e19a8264 et al [3].  We had a long discussion about how
quickdie() and startup_die() might be made safe(r), without any
real consensus emerging about any alternatives being better than the
status quo.  I still don't have an improvement idea for quickdie();
I don't want to give up trying to send a message to the client.
Note, however, that quickdie() does end with _exit(2) so that at
least it's not trying to execute arbitrary process-cleanup code.

In short then, I want to ensure that postmaster children's SIGQUIT
handlers always end with _exit(2) rather than some other exit method.
We have two exceptions now and they need to get fixed.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CALDaNm1d1hHPZUg3xU4XjtWBOLCrA%2B-2cJcLpw-cePZ%3DGgDVfA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/20200908.175557.617150409868541587.horikyota.ntt%40gmail.com
[3] https://www.postgresql.org/message-id/flat/CAOMx_OAuRUHiAuCg2YgicZLzPVv5d9_H4KrL_OFsFP%3DVPekigA%40mail.gmail.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: VACUUM (INTERRUPTIBLE)?
Next
From: Andres Freund
Date:
Subject: Re: More aggressive vacuuming of temporary tables