SIGQUIT handling, redux - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | SIGQUIT handling, redux |
Date | |
Msg-id | 1850884.1599601164@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: SIGQUIT handling, redux
Re: SIGQUIT handling, redux Re: SIGQUIT handling, redux |
List | pgsql-hackers |
This is to pull together a couple of recent threads that are seemingly unrelated to $SUBJECT. Over in the long thread at [1] we've agreed to try to make the backend code actually do what assorted comments claim it does, namely allow SIGQUIT to be accepted at any point after a given process has established its signal handlers. (Right now, it mostly is not accepted during error recovery, but there's no clear reason to preserve that exception.) Of course, this is only safe if the SIGQUIT handler is safe to be invoked anywhere, so I did a quick survey of backend signal handlers to see if that is true. I immediately found pgarch_exit() which surely is not. It turns out that Horiguchi-san already noticed that and proposed to fix it, within the seemingly entirely unrelated patch series for a shared-memory based stats collector (see patch 0001 in [2]). I think we should go ahead and commit that, and maybe even back-patch it. There seems no good reason for the archiver to treat SIGQUIT as nonfatal when other postmaster children do not. Another thing I found is startup_die() in postmaster.c, which catches SIGQUIT during the authentication phase of a new backend. This calls proc_exit(1), which (a) is unsafe on its face, and (b) potentially demotes a SIGQUIT into something that will not cause the parent postmaster to perform a DB-wide restart. There seems no good reason for that behavior, either. I propose that we use SignalHandlerForCrashExit for SIGQUIT here too. In passing, it's worth noting that startup_die() isn't really much safer for SIGTERM than it is for SIGQUIT; the only argument for distinguishing those is that code that applies BlockSig will at least manage to block the former. So it's slightly tempting to drop startup_die() altogether in favor of using SignalHandlerForCrashExit for the SIGTERM-during-auth case too. However, that'd have the effect of causing a "fast shutdown" to get converted to a panic if it happens while any sessions are in authentication phase, so I think this cure is likely worse than the disease. Worth reading in connection with this is the thread that led up to commits 8e19a8264 et al [3]. We had a long discussion about how quickdie() and startup_die() might be made safe(r), without any real consensus emerging about any alternatives being better than the status quo. I still don't have an improvement idea for quickdie(); I don't want to give up trying to send a message to the client. Note, however, that quickdie() does end with _exit(2) so that at least it's not trying to execute arbitrary process-cleanup code. In short then, I want to ensure that postmaster children's SIGQUIT handlers always end with _exit(2) rather than some other exit method. We have two exceptions now and they need to get fixed. regards, tom lane [1] https://www.postgresql.org/message-id/flat/CALDaNm1d1hHPZUg3xU4XjtWBOLCrA%2B-2cJcLpw-cePZ%3DGgDVfA%40mail.gmail.com [2] https://www.postgresql.org/message-id/20200908.175557.617150409868541587.horikyota.ntt%40gmail.com [3] https://www.postgresql.org/message-id/flat/CAOMx_OAuRUHiAuCg2YgicZLzPVv5d9_H4KrL_OFsFP%3DVPekigA%40mail.gmail.com
pgsql-hackers by date: