Thread: SIGQUIT handling, redux
This is to pull together a couple of recent threads that are seemingly unrelated to $SUBJECT. Over in the long thread at [1] we've agreed to try to make the backend code actually do what assorted comments claim it does, namely allow SIGQUIT to be accepted at any point after a given process has established its signal handlers. (Right now, it mostly is not accepted during error recovery, but there's no clear reason to preserve that exception.) Of course, this is only safe if the SIGQUIT handler is safe to be invoked anywhere, so I did a quick survey of backend signal handlers to see if that is true. I immediately found pgarch_exit() which surely is not. It turns out that Horiguchi-san already noticed that and proposed to fix it, within the seemingly entirely unrelated patch series for a shared-memory based stats collector (see patch 0001 in [2]). I think we should go ahead and commit that, and maybe even back-patch it. There seems no good reason for the archiver to treat SIGQUIT as nonfatal when other postmaster children do not. Another thing I found is startup_die() in postmaster.c, which catches SIGQUIT during the authentication phase of a new backend. This calls proc_exit(1), which (a) is unsafe on its face, and (b) potentially demotes a SIGQUIT into something that will not cause the parent postmaster to perform a DB-wide restart. There seems no good reason for that behavior, either. I propose that we use SignalHandlerForCrashExit for SIGQUIT here too. In passing, it's worth noting that startup_die() isn't really much safer for SIGTERM than it is for SIGQUIT; the only argument for distinguishing those is that code that applies BlockSig will at least manage to block the former. So it's slightly tempting to drop startup_die() altogether in favor of using SignalHandlerForCrashExit for the SIGTERM-during-auth case too. However, that'd have the effect of causing a "fast shutdown" to get converted to a panic if it happens while any sessions are in authentication phase, so I think this cure is likely worse than the disease. Worth reading in connection with this is the thread that led up to commits 8e19a8264 et al [3]. We had a long discussion about how quickdie() and startup_die() might be made safe(r), without any real consensus emerging about any alternatives being better than the status quo. I still don't have an improvement idea for quickdie(); I don't want to give up trying to send a message to the client. Note, however, that quickdie() does end with _exit(2) so that at least it's not trying to execute arbitrary process-cleanup code. In short then, I want to ensure that postmaster children's SIGQUIT handlers always end with _exit(2) rather than some other exit method. We have two exceptions now and they need to get fixed. regards, tom lane [1] https://www.postgresql.org/message-id/flat/CALDaNm1d1hHPZUg3xU4XjtWBOLCrA%2B-2cJcLpw-cePZ%3DGgDVfA%40mail.gmail.com [2] https://www.postgresql.org/message-id/20200908.175557.617150409868541587.horikyota.ntt%40gmail.com [3] https://www.postgresql.org/message-id/flat/CAOMx_OAuRUHiAuCg2YgicZLzPVv5d9_H4KrL_OFsFP%3DVPekigA%40mail.gmail.com
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > This is to pull together a couple of recent threads that are seemingly > unrelated to $SUBJECT. > > Over in the long thread at [1] we've agreed to try to make the backend > code actually do what assorted comments claim it does, namely allow > SIGQUIT to be accepted at any point after a given process has established > its signal handlers. (Right now, it mostly is not accepted during error > recovery, but there's no clear reason to preserve that exception.) > > Of course, this is only safe if the SIGQUIT handler is safe to be invoked > anywhere, so I did a quick survey of backend signal handlers to see if > that is true. I immediately found pgarch_exit() which surely is not. It > turns out that Horiguchi-san already noticed that and proposed to fix it, > within the seemingly entirely unrelated patch series for a shared-memory > based stats collector (see patch 0001 in [2]). I think we should go ahead > and commit that, and maybe even back-patch it. There seems no good reason > for the archiver to treat SIGQUIT as nonfatal when other postmaster > children do not. As I mentioned over there, I agree that we should do this and we should further have the statistics collector also do so, which currently sets up SIGQUIT with ShutdownRequestPending() and in its loop decides it's fine to write out the stats file (which we're going to remove during recovery anyway...) and then call exit(0). I also think we should back-patch these changes, as the commit mentioned in Horiguchi-san's patch for pgarch_exit() was. > Another thing I found is startup_die() in postmaster.c, which catches > SIGQUIT during the authentication phase of a new backend. This calls > proc_exit(1), which (a) is unsafe on its face, and (b) potentially > demotes a SIGQUIT into something that will not cause the parent postmaster > to perform a DB-wide restart. There seems no good reason for that > behavior, either. I propose that we use SignalHandlerForCrashExit > for SIGQUIT here too. Yikes, agreed. > In passing, it's worth noting that startup_die() isn't really much safer > for SIGTERM than it is for SIGQUIT; the only argument for distinguishing > those is that code that applies BlockSig will at least manage to block the > former. So it's slightly tempting to drop startup_die() altogether in > favor of using SignalHandlerForCrashExit for the SIGTERM-during-auth case > too. However, that'd have the effect of causing a "fast shutdown" to get > converted to a panic if it happens while any sessions are in > authentication phase, so I think this cure is likely worse than the > disease. Agreed, that's definitely no good. > Worth reading in connection with this is the thread that led up to > commits 8e19a8264 et al [3]. We had a long discussion about how > quickdie() and startup_die() might be made safe(r), without any > real consensus emerging about any alternatives being better than the > status quo. I still don't have an improvement idea for quickdie(); > I don't want to give up trying to send a message to the client. > Note, however, that quickdie() does end with _exit(2) so that at > least it's not trying to execute arbitrary process-cleanup code. > > In short then, I want to ensure that postmaster children's SIGQUIT > handlers always end with _exit(2) rather than some other exit method. > We have two exceptions now and they need to get fixed. I agree we should fix these. Thanks, Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> Of course, this is only safe if the SIGQUIT handler is safe to be invoked >> anywhere, so I did a quick survey of backend signal handlers to see if >> that is true. I immediately found pgarch_exit() which surely is not. It >> turns out that Horiguchi-san already noticed that and proposed to fix it, >> within the seemingly entirely unrelated patch series for a shared-memory >> based stats collector (see patch 0001 in [2]). I think we should go ahead >> and commit that, and maybe even back-patch it. There seems no good reason >> for the archiver to treat SIGQUIT as nonfatal when other postmaster >> children do not. > As I mentioned over there, I agree that we should do this and we should > further have the statistics collector also do so, which currently sets > up SIGQUIT with ShutdownRequestPending() and in its loop decides it's > fine to write out the stats file (which we're going to remove during > recovery anyway...) and then call exit(0). I noticed that that was different from everything else, but it's not actually signal-unsafe, so it seems like a different topic from what I'm on about at the moment. I don't mind if you or somebody else wants to change it, but I don't see it as a back-patchable bug fix. > I also think we should > back-patch these changes, as the commit mentioned in Horiguchi-san's > patch for pgarch_exit() was. Agreed; I'll go make it happen. regards, tom lane
I wrote: > Stephen Frost <sfrost@snowman.net> writes: >> As I mentioned over there, I agree that we should do this and we should >> further have the statistics collector also do so, which currently sets >> up SIGQUIT with ShutdownRequestPending() and in its loop decides it's >> fine to write out the stats file (which we're going to remove during >> recovery anyway...) and then call exit(0). > I noticed that that was different from everything else, but it's not > actually signal-unsafe, so it seems like a different topic from what > I'm on about at the moment. I don't mind if you or somebody else > wants to change it, but I don't see it as a back-patchable bug fix. Note also that the postmaster actually uses SIGQUIT to command normal shutdown of the stats collector (cf reaper(), around line 3125 in HEAD). So this needs a change in signaling conventions, not just internal tweaks in the collector. Not a big deal, but it reinforces my feeling that it should be a HEAD-only change. regards, tom lane
Hi, On 2020-09-08 17:39:24 -0400, Tom Lane wrote: > Of course, this is only safe if the SIGQUIT handler is safe to be invoked > anywhere, so I did a quick survey of backend signal handlers to see if > that is true. I immediately found pgarch_exit() which surely is not. It > turns out that Horiguchi-san already noticed that and proposed to fix it, > within the seemingly entirely unrelated patch series for a shared-memory > based stats collector (see patch 0001 in [2]). I think we should go ahead > and commit that, and maybe even back-patch it. There seems no good reason > for the archiver to treat SIGQUIT as nonfatal when other postmaster > children do not. +1 > Another thing I found is startup_die() in postmaster.c, which catches > SIGQUIT during the authentication phase of a new backend. This calls > proc_exit(1), which (a) is unsafe on its face, and (b) potentially > demotes a SIGQUIT into something that will not cause the parent postmaster > to perform a DB-wide restart. There seems no good reason for that > behavior, either. I propose that we use SignalHandlerForCrashExit > for SIGQUIT here too. Yikes, yea, that seems like an important change. I wish startup_die() weren't named startup_ - every single time I see the name I think it's about the startup process... I think StartupPacketTimeoutHandler is another case? > In passing, it's worth noting that startup_die() isn't really much safer > for SIGTERM than it is for SIGQUIT; the only argument for distinguishing > those is that code that applies BlockSig will at least manage to block the > former. Which is pretty unconvincing... I wonder if we could at least *try* to rely on CFR() for a while. It'd not be pretty, but probably doable, to set up a timer inside the signal handler and try to exit using normal mechanisms for a while (some overlap with authentication timeout). That'd leave the question of what we do once that timer expires. There's quite a few possible ways that could reproducibly happen, e.g. if DNS lookups are required during auth. The long term correct way to handle this would obviously be to restructure everything that happens covered by startup_die() in a non-blocking manner and just rely on CFR(). But that's a tall order to get done anytime soon, particularly things like DNS are IIRC pretty hard without relying on custom libraries. Shorter term I don't really see an alternative to escalating to SIGQUIT which is obviously terrible. > So it's slightly tempting to drop startup_die() altogether in favor of > using SignalHandlerForCrashExit for the SIGTERM-during-auth case too. > However, that'd have the effect of causing a "fast shutdown" to get > converted to a panic if it happens while any sessions are in > authentication phase, so I think this cure is likely worse than the > disease. Indeed. > I don't want to give up trying to send a message to the client. That still doesn't make much sense to me. The potential for hanging (e.g. inside malloc) is so much worse than not sending a message... And we already infer things about the server dying in libpq when the connection just closes (which I am admittedly also not happy about). But I also think we can reduce the risk a bit, see below. I only had one coffee so far (and it looks like the sun has died outside), so maybe I'm just slow: But, uh, we don't currently send a message startup_die(), right? So that part is about quickdie()? > Worth reading in connection with this is the thread that led up to > commits 8e19a8264 et al [3]. We had a long discussion about how > quickdie() and startup_die() might be made safe(r), without any > real consensus emerging about any alternatives being better than the > status quo. I still don't have an improvement idea for quickdie(); I still think that we should at least mitigate the risk to hang inside quickdie() by at least disabling translations (since the rest happens inside the pre-allocated error memory context, which should lower the risk). And perhaps do similarly for the memory required for encryption, if enabled, and where possible. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > I wish startup_die() weren't named startup_ - every single time I see > the name I think it's about the startup process... We could call it startup_packet_die or something? > I think StartupPacketTimeoutHandler is another case? Yeah. Although it's a lot less risky, since if the timeout is reached we're almost certainly waiting for client input. >> In passing, it's worth noting that startup_die() isn't really much safer >> for SIGTERM than it is for SIGQUIT; the only argument for distinguishing >> those is that code that applies BlockSig will at least manage to block the >> former. > Which is pretty unconvincing... Agreed, it'd be nice if this were less shaky. On the other hand, we've seen darn few complaints traceable to this AFAIR. I'm not really sure it's worth putting a lot of effort into. > The long term correct way to handle this would obviously be to > restructure everything that happens covered by startup_die() in a > non-blocking manner and just rely on CFR(). But that's a tall order to > get done anytime soon, particularly things like DNS are IIRC pretty hard > without relying on custom libraries. Not only DNS, but all the various auth libraries would have to be contended with. Lots of work there compared to the likely rewards. >> I don't want to give up trying to send a message to the client. > That still doesn't make much sense to me. The potential for hanging > (e.g. inside malloc) is so much worse than not sending a message... We see backends going through this code on a very regular basis in the buildfarm, but complete hangs are rare as can be. I think you overestimate the severity of the problem. > I only had one coffee so far (and it looks like the sun has died > outside), so maybe I'm just slow: But, uh, we don't currently send a > message startup_die(), right? > So that part is about quickdie()? Right. Note that startup_die() is pre-authentication, so I'm doubtful that we should tell the would-be client anything about the state of the server at that point, even ignoring these risk factors. (I'm a bit inclined to remove the comment suggesting that'd be desirable.) regards, tom lane
Hi, On 2020-09-09 16:09:00 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I wish startup_die() weren't named startup_ - every single time I see > > the name I think it's about the startup process... > > We could call it startup_packet_die or something? Yea, I think that'd be good. > > I think StartupPacketTimeoutHandler is another case? > > Yeah. Although it's a lot less risky, since if the timeout is reached > we're almost certainly waiting for client input. An adversary could control that to a significant degree - but ordinarily I agree... > >> In passing, it's worth noting that startup_die() isn't really much safer > >> for SIGTERM than it is for SIGQUIT; the only argument for distinguishing > >> those is that code that applies BlockSig will at least manage to block the > >> former. > > > Which is pretty unconvincing... > > Agreed, it'd be nice if this were less shaky. On the other hand, > we've seen darn few complaints traceable to this AFAIR. I'm not > really sure it's worth putting a lot of effort into. Not sure how many would plausibly reach us though. A common reaction will likely just to be to force-restart the server, not to fully investigate the issue. Particularly because it'll often be once something already has gone wrong... > >> I don't want to give up trying to send a message to the client. > > > That still doesn't make much sense to me. The potential for hanging > > (e.g. inside malloc) is so much worse than not sending a message... > > We see backends going through this code on a very regular basis in the > buildfarm, but complete hangs are rare as can be. I think you > overestimate the severity of the problem. I don't think the BF exercises the problmetic paths to a significant degree. It's mostly local socket connections, and where not it's localhost. There's no slow DNS, no more complicated authentication methods, no packet loss. How often do we ever actually end up even getting close to any of the paths but immediate shutdowns? And in the SIGQUIT path, how often do we end up in the SIGKILL path, masking potential deadlocks? > > I only had one coffee so far (and it looks like the sun has died > > outside), so maybe I'm just slow: But, uh, we don't currently send a > > message startup_die(), right? > > So that part is about quickdie()? > > Right. Note that startup_die() is pre-authentication, so I'm doubtful > that we should tell the would-be client anything about the state of > the server at that point, even ignoring these risk factors. (I'm a > bit inclined to remove the comment suggesting that'd be desirable.) Yea, I think just putting in an editorialized version of your paragraph would make sense. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2020-09-09 16:09:00 -0400, Tom Lane wrote: >> We could call it startup_packet_die or something? > Yea, I think that'd be good. I'll make it so. >> We see backends going through this code on a very regular basis in the >> buildfarm, but complete hangs are rare as can be. I think you >> overestimate the severity of the problem. > I don't think the BF exercises the problmetic paths to a significant > degree. It's mostly local socket connections, and where not it's > localhost. There's no slow DNS, no more complicated authentication > methods, no packet loss. How often do we ever actually end up even > getting close to any of the paths but immediate shutdowns? Since we're talking about quickdie(), immediate shutdown/crash restart is exactly the case of concern, and the buildfarm exercises it all the time. > And in the > SIGQUIT path, how often do we end up in the SIGKILL path, masking > potential deadlocks? True, we can't really tell that. I wonder if we should make the postmaster emit a log message when it times out and goes to SIGKILL. After a few months we could scrape the buildfarm logs and get a pretty good handle on it. regards, tom lane
I wrote: > Not only DNS, but all the various auth libraries would have to be > contended with. Lots of work there compared to the likely rewards. Wait a minute. The entire authentication cycle happens inside InitPostgres, using the backend's normal signal handlers. So maybe we are overthinking the problem. What if we simply postpone ProcessStartupPacket into that same place, and run it under the same rules as we do for authentication? We would waste more cycles than we do now for the case where the client closes the connection without sending a startup packet, but not enormously so, I think --- and optimizing that case doesn't seem like a high-priority goal anyway. And cases like DNS lookup taking forever don't seem like any more of an issue than auth lookup taking forever. regards, tom lane
Hi, On 2020-09-09 16:30:37 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2020-09-09 16:09:00 -0400, Tom Lane wrote: > >> We could call it startup_packet_die or something? > > > Yea, I think that'd be good. > > I'll make it so. Thanks! > >> We see backends going through this code on a very regular basis in the > >> buildfarm, but complete hangs are rare as can be. I think you > >> overestimate the severity of the problem. > > > I don't think the BF exercises the problmetic paths to a significant > > degree. It's mostly local socket connections, and where not it's > > localhost. There's no slow DNS, no more complicated authentication > > methods, no packet loss. How often do we ever actually end up even > > getting close to any of the paths but immediate shutdowns? > > Since we're talking about quickdie(), immediate shutdown/crash restart > is exactly the case of concern, and the buildfarm exercises it all the > time. Yea, but only in simple cases. Largely no SSL / kerberos. Largely untranslated. Mostly the immediate shutdowns aren't when inside plpython or such. > > And in the > > SIGQUIT path, how often do we end up in the SIGKILL path, masking > > potential deadlocks? > > True, we can't really tell that. I wonder if we should make the > postmaster emit a log message when it times out and goes to SIGKILL. > After a few months we could scrape the buildfarm logs and get a > pretty good handle on it. I think that'd be a good idea. Greetings, Andres Freund
I wrote: >> Not only DNS, but all the various auth libraries would have to be >> contended with. Lots of work there compared to the likely rewards. > Wait a minute. The entire authentication cycle happens inside > InitPostgres, using the backend's normal signal handlers. So > maybe we are overthinking the problem. What if we simply postpone > ProcessStartupPacket into that same place, and run it under the same > rules as we do for authentication? Or, continuing to look for other answers ... During BackendInitialize we have not yet touched shared memory. This is not incidental but must be so, per its header comment. Therefore it seems like we could have these signal handlers (for SIGTERM or timeout) do _exit(1), thereby resolving the signal unsafety while not provoking a database restart. We don't care whether inside-the-process state is sane, and we shouldn't have changed anything yet in shared memory. This is obviously not 100.00% safe, since maybe something could violate these assumptions, but it seems a lot safer than using proc_exit() from a signal handler. One way to help catch any such assumption-violations is to add an assertion that no on_shmem_exit handlers have been set up yet. If there are none, there should be no expectation of having to clean up shared state. regards, tom lane
Here's a draft patch that I think would be reasonable to back-patch. (Before v13, we'd need a bespoke SIGQUIT handler to substitute for SignalHandlerForCrashExit, but that's easy enough.) Aside from comment updates, this * uses SignalHandlerForCrashExit for SIGQUIT * renames startup_die per your request * moves BackendInitialize's interrupt-re-disabling code up a bit to reduce the scope where these interrupts are active. This doesn't make things a whole lot safer, but it can't hurt. I'll take a closer look at the idea of using _exit(1) tomorrow, but I'd be pretty hesitant to back-patch that. regards, tom lane diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 42223c0f61..e65086f7b4 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -112,6 +112,7 @@ #include "postmaster/autovacuum.h" #include "postmaster/bgworker_internals.h" #include "postmaster/fork_process.h" +#include "postmaster/interrupt.h" #include "postmaster/pgarch.h" #include "postmaster/postmaster.h" #include "postmaster/syslogger.h" @@ -405,7 +406,7 @@ static void SIGHUP_handler(SIGNAL_ARGS); static void pmdie(SIGNAL_ARGS); static void reaper(SIGNAL_ARGS); static void sigusr1_handler(SIGNAL_ARGS); -static void startup_die(SIGNAL_ARGS); +static void startup_packet_die(SIGNAL_ARGS); static void dummy_handler(SIGNAL_ARGS); static void StartupPacketTimeoutHandler(void); static void CleanupBackend(int pid, int exitstatus); @@ -4340,22 +4341,30 @@ BackendInitialize(Port *port) whereToSendOutput = DestRemote; /* now safe to ereport to client */ /* - * We arrange for a simple exit(1) if we receive SIGTERM or SIGQUIT or - * timeout while trying to collect the startup packet. Otherwise the - * postmaster cannot shutdown the database FAST or IMMED cleanly if a - * buggy client fails to send the packet promptly. XXX it follows that - * the remainder of this function must tolerate losing control at any - * instant. Likewise, any pg_on_exit_callback registered before or during - * this function must be prepared to execute at any instant between here - * and the end of this function. Furthermore, affected callbacks execute - * partially or not at all when a second exit-inducing signal arrives - * after proc_exit_prepare() decrements on_proc_exit_index. (Thanks to - * that mechanic, callbacks need not anticipate more than one call.) This - * is fragile; it ought to instead follow the norm of handling interrupts - * at selected, safe opportunities. - */ - pqsignal(SIGTERM, startup_die); - pqsignal(SIGQUIT, startup_die); + * We arrange to do proc_exit(1) if we receive SIGTERM or timeout while + * trying to collect the startup packet; while SIGQUIT results in + * _exit(2). Otherwise the postmaster cannot shutdown the database FAST + * or IMMED cleanly if a buggy client fails to send the packet promptly. + * + * XXX this is pretty dangerous; signal handlers should not call anything + * as complex as proc_exit() directly. We minimize the hazard by not + * keeping these handlers active for longer than we must. However, it + * seems necessary to be able to escape out of DNS lookups as well as the + * startup packet reception proper, so we can't narrow the scope further + * than is done here. + * + * XXX it follows that the remainder of this function must tolerate losing + * control at any instant. Likewise, any pg_on_exit_callback registered + * before or during this function must be prepared to execute at any + * instant between here and the end of this function. Furthermore, + * affected callbacks execute partially or not at all when a second + * exit-inducing signal arrives after proc_exit_prepare() decrements + * on_proc_exit_index. (Thanks to that mechanic, callbacks need not + * anticipate more than one call.) This is fragile; it ought to instead + * follow the norm of handling interrupts at selected, safe opportunities. + */ + pqsignal(SIGTERM, startup_packet_die); + pqsignal(SIGQUIT, SignalHandlerForCrashExit); InitializeTimeouts(); /* establishes SIGALRM handler */ PG_SETMASK(&StartupBlockSig); @@ -4411,8 +4420,8 @@ BackendInitialize(Port *port) port->remote_hostname = strdup(remote_host); /* - * Ready to begin client interaction. We will give up and exit(1) after a - * time delay, so that a broken client can't hog a connection + * Ready to begin client interaction. We will give up and proc_exit(1) + * after a time delay, so that a broken client can't hog a connection * indefinitely. PreAuthDelay and any DNS interactions above don't count * against the time limit. * @@ -4434,6 +4443,12 @@ BackendInitialize(Port *port) */ status = ProcessStartupPacket(port, false, false); + /* + * Disable the timeout, and prevent SIGTERM again. + */ + disable_timeout(STARTUP_PACKET_TIMEOUT, false); + PG_SETMASK(&BlockSig); + /* * Stop here if it was bad or a cancel packet. ProcessStartupPacket * already did any appropriate error reporting. @@ -4459,12 +4474,6 @@ BackendInitialize(Port *port) pfree(ps_data.data); set_ps_display("initializing"); - - /* - * Disable the timeout, and prevent SIGTERM/SIGQUIT again. - */ - disable_timeout(STARTUP_PACKET_TIMEOUT, false); - PG_SETMASK(&BlockSig); } @@ -5359,16 +5368,22 @@ sigusr1_handler(SIGNAL_ARGS) } /* - * SIGTERM or SIGQUIT while processing startup packet. + * SIGTERM while processing startup packet. * Clean up and exit(1). * - * XXX: possible future improvement: try to send a message indicating - * why we are disconnecting. Problem is to be sure we don't block while - * doing so, nor mess up SSL initialization. In practice, if the client - * has wedged here, it probably couldn't do anything with the message anyway. + * Running proc_exit() from a signal handler is pretty unsafe, since we + * can't know what code we've interrupted. But the alternative of using + * _exit(2) is also unpalatable, since it'd mean that a "fast shutdown" + * would cause a database crash cycle (forcing WAL replay at restart) + * if any sessions are in authentication. So we live with it for now. + * + * One might be tempted to try to send a message indicating why we are + * disconnecting. However, that would make this even more unsafe. Also, + * it seems undesirable to provide clues about the database's state to + * a client that has not yet completed authentication. */ static void -startup_die(SIGNAL_ARGS) +startup_packet_die(SIGNAL_ARGS) { proc_exit(1); } @@ -5389,7 +5404,11 @@ dummy_handler(SIGNAL_ARGS) /* * Timeout while processing startup packet. - * As for startup_die(), we clean up and exit(1). + * As for startup_packet_die(), we clean up and exit(1). + * + * This is theoretically just as hazardous as it is in startup_packet_die(), + * although in practice we're almost certainly waiting for client input, + * which greatly reduces the risk. */ static void StartupPacketTimeoutHandler(void)
I wrote: > Of course, this is only safe if the SIGQUIT handler is safe to be invoked > anywhere, so I did a quick survey of backend signal handlers to see if > that is true. This is straying a bit from the stated topic of this thread, but ... I did some further looking around to see whether there were any unsafe signal handlers besides SIGQUIT ones. The situation is not too awful, but I did find several issues not already mentioned in this thread: StartupProcShutdownHandler (SIGTERM) This conditionally calls proc_exit(1). The conditions boil down to are-we-interrupting-a-system(3)-call, so who knows how safe that is? I wouldn't care to bet that system() doesn't use malloc, for instance. Still, the odds are very good that if a signal did arrive, it'd be interrupting system()'s waitpid() or equivalent kernel call, which is likely safe enough. bgworker_die (SIGTERM) Calls ereport(FATAL). This is surely not any safer than, say, quickdie(). No, it's worse, because at least that won't try to go out via proc_exit(). FloatExceptionHandler (SIGFPE) Calls ereport(ERROR). This might be okay, though, since the trap should be synchronous with the offending calculation. Besides, if you're risking divide by zero or the like in critical code, You're Doing It Wrong. RecoveryConflictInterrupt (called from SIGUSR1) Calls a whole boatload of state tests that were never designed to be interrupt-safe, such as transaction-state-related inquiries in xact.c. The lack of any explicit awareness in this code that it's in a signal handler doesn't discourage people from inserting even more dubious stuff. I think this needs to be burned to the ground and rewritten. StandbyDeadLockHandler (from SIGALRM) StandbyTimeoutHandler (ditto) Calls CancelDBBackends, which just for starters tries to acquire an LWLock. I think the only reason we've gotten away with this for this long is the high probability that by the time either timeout fires, we're going to be blocked on a semaphore. I don't have any ideas about how to fix any of these things, but I thought getting the problems on the record would be good. regards, tom lane
From: Tom Lane <tgl@sss.pgh.pa.us> > This is straying a bit from the stated topic of this thread, but ... > I did some further looking around to see whether there were any > unsafe signal handlers besides SIGQUIT ones. The situation is not > too awful, but I did find several issues not already mentioned > in this thread: Wow, your eyes catch this many issues. (I was just wondering about one or two of them.) I'm not sure if this is related, but I had been wondering if the following can be what it is now. (1) When logical replication is used, pg_ctl stop with the default fast mode emits the message about termination of logical replicationlauncher. Although it's not FATAL or ERROR, but I was a bit startled when I saw this message for the first time. Why should this message be emitted while nothing about other postmaster children are reported? The logical rep launcherignores SIGINT (SIG_IGN). LOG: received fast shutdown request LOG: aborting any active transactions LOG: background worker "logical replication launcher" (PID 10056) exited with exit code 1 LOG: shutting down LOG: database system is shut down (2) When the physical standby stops, a FATAL message is output. It may be annoying to the DBA that monitors messages with highseverity. LOG: received fast shutdown request LOG: aborting any active transactions FATAL: terminating walreceiver process due to administrator command LOG: shutting down LOG: database system is shut down (3) When WaitLatch(EXIT_ON_POSTMASTER_DEATH) detects postmaster death, it calls proc_exit(1). Can we call _exit(1) here likewise? Regards Takayuki Tsunakawa
At Wed, 09 Sep 2020 10:46:28 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in > > I also think we should > > back-patch these changes, as the commit mentioned in Horiguchi-san's > > patch for pgarch_exit() was. > > Agreed; I'll go make it happen. Thank you for committing this! regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, Sep 9, 2020 at 10:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > bgworker_die (SIGTERM) > > Calls ereport(FATAL). This is surely not any safer than, say, > quickdie(). No, it's worse, because at least that won't try > to go out via proc_exit(). I think bgworker_die() is pretty much a terrible idea. Every background worker I've written has actually needed to use CHECK_FOR_INTERRUPTS(). I think that the only way this could actually be safe is if you have a background worker that never uses ereport() itself, so that the ereport() in the signal handler can't be interrupting one that's already happening. This seems unlikely to be the normal case, or anything close to it. Most background workers probably are shared-memory connected and use a lot of PostgreSQL infrastructure and thus ereport() all over the place. Now what to do about it I don't know exactly, but it would be nice to do something. > StandbyDeadLockHandler (from SIGALRM) > StandbyTimeoutHandler (ditto) > > Calls CancelDBBackends, which just for starters tries to acquire > an LWLock. I think the only reason we've gotten away with this > for this long is the high probability that by the time either > timeout fires, we're going to be blocked on a semaphore. Yeah, I'm not sure these are so bad. In fact, in the deadlock case, I believe the old coding was designed to make sure we *had to* be blocked on a semaphore, but I'm not sure whether that's still true. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Sep 9, 2020 at 10:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> bgworker_die (SIGTERM) >> Calls ereport(FATAL). This is surely not any safer than, say, >> quickdie(). No, it's worse, because at least that won't try >> to go out via proc_exit(). > I think bgworker_die() is pretty much a terrible idea. Yeah. It'd likely be better to insist that bgworkers handle SIGTERM the same way backends do, via CHECK_FOR_INTERRUPTS. Not sure how big a change that would be. > I think that the only way this could actually > be safe is if you have a background worker that never uses ereport() > itself, so that the ereport() in the signal handler can't be > interrupting one that's already happening. That doesn't begin to make it safe, because quite aside from anything that happens in elog.c, we're then going to go out via proc_exit() which will invoke who knows what. >> StandbyDeadLockHandler (from SIGALRM) >> StandbyTimeoutHandler (ditto) >> Calls CancelDBBackends, which just for starters tries to acquire >> an LWLock. I think the only reason we've gotten away with this >> for this long is the high probability that by the time either >> timeout fires, we're going to be blocked on a semaphore. > Yeah, I'm not sure these are so bad. In fact, in the deadlock case, I > believe the old coding was designed to make sure we *had to* be > blocked on a semaphore, but I'm not sure whether that's still true. Looking at this closer, the only code that could get interrupted by these timeouts is a call to ProcWaitForSignal, which is void ProcWaitForSignal(uint32 wait_event_info) { (void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0, wait_event_info); ResetLatch(MyLatch); CHECK_FOR_INTERRUPTS(); } Interrupting the latch operations might be safe enough, although now I'm casting a side-eye at Munro's recent changes to share WaitEvent state all over the place. I wonder whether blocking on an LWLock inside the signal handler will break an interrupted WaitLatch. Also, man that CHECK_FOR_INTERRUPTS() looks like trouble. Could we take that out? regards, tom lane
On Thu, Sep 10, 2020 at 12:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Also, man that CHECK_FOR_INTERRUPTS() looks like trouble. > Could we take that out? Maybe I'm missing something, but why wouldn't that be a horrible idea? We do not want to have long waits where we refuse to respond to interrupts. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Sep 10, 2020 at 12:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Also, man that CHECK_FOR_INTERRUPTS() looks like trouble. >> Could we take that out? > Maybe I'm missing something, but why wouldn't that be a horrible idea? > We do not want to have long waits where we refuse to respond to > interrupts. It might be appropriate for some of the callers to do it. But I don't see any great argument why ProcWaitForSignal itself has to do it. regards, tom lane
I wrote: > I'll take a closer look at the idea of using _exit(1) tomorrow, > but I'd be pretty hesitant to back-patch that. Here's a patch for that; it passes light testing, including forcing the backend through the SIGTERM and timeout code paths. There's not much to it except for changing the signal handlers, but I did add a cross-check that no on-exit handlers have been registered before we get done with using these signal handlers. I looked briefly at the idea of postponing ProcessStartupPacket until InitPostgres has set up a fairly normal environment. It seems like it'd take a fair amount of refactoring. I really doubt it's worth the effort, even though the result would be arguably cleaner logic than what we have now. regards, tom lane diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 15ad675bc1..081022a206 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -4298,6 +4298,8 @@ report_fork_failure_to_client(Port *port, int errnum) * returns: nothing. Will not return at all if there's any failure. * * Note: this code does not depend on having any access to shared memory. + * Indeed, our approach to SIGTERM/timeout handling *requires* that + * shared memory not have been touched yet; see comments within. * In the EXEC_BACKEND case, we are physically attached to shared memory * but have not yet set up most of our local pointers to shmem structures. */ @@ -4341,27 +4343,14 @@ BackendInitialize(Port *port) whereToSendOutput = DestRemote; /* now safe to ereport to client */ /* - * We arrange to do proc_exit(1) if we receive SIGTERM or timeout while - * trying to collect the startup packet; while SIGQUIT results in - * _exit(2). Otherwise the postmaster cannot shutdown the database FAST - * or IMMED cleanly if a buggy client fails to send the packet promptly. + * We arrange to do _exit(1) if we receive SIGTERM or timeout while trying + * to collect the startup packet; while SIGQUIT results in _exit(2). + * Otherwise the postmaster cannot shutdown the database FAST or IMMED + * cleanly if a buggy client fails to send the packet promptly. * - * XXX this is pretty dangerous; signal handlers should not call anything - * as complex as proc_exit() directly. We minimize the hazard by not - * keeping these handlers active for longer than we must. However, it - * seems necessary to be able to escape out of DNS lookups as well as the - * startup packet reception proper, so we can't narrow the scope further - * than is done here. - * - * XXX it follows that the remainder of this function must tolerate losing - * control at any instant. Likewise, any pg_on_exit_callback registered - * before or during this function must be prepared to execute at any - * instant between here and the end of this function. Furthermore, - * affected callbacks execute partially or not at all when a second - * exit-inducing signal arrives after proc_exit_prepare() decrements - * on_proc_exit_index. (Thanks to that mechanic, callbacks need not - * anticipate more than one call.) This is fragile; it ought to instead - * follow the norm of handling interrupts at selected, safe opportunities. + * Exiting with _exit(1) is only possible because we have not yet touched + * shared memory; therefore no outside-the-process state needs to get + * cleaned up. */ pqsignal(SIGTERM, process_startup_packet_die); pqsignal(SIGQUIT, SignalHandlerForCrashExit); @@ -4420,8 +4409,8 @@ BackendInitialize(Port *port) port->remote_hostname = strdup(remote_host); /* - * Ready to begin client interaction. We will give up and proc_exit(1) - * after a time delay, so that a broken client can't hog a connection + * Ready to begin client interaction. We will give up and _exit(1) after + * a time delay, so that a broken client can't hog a connection * indefinitely. PreAuthDelay and any DNS interactions above don't count * against the time limit. * @@ -4449,6 +4438,17 @@ BackendInitialize(Port *port) disable_timeout(STARTUP_PACKET_TIMEOUT, false); PG_SETMASK(&BlockSig); + /* + * As a safety check that nothing in startup has yet performed + * shared-memory modifications that would need to be undone if we had + * exited through SIGTERM or timeout above, check that no on_shmem_exit + * handlers have been registered yet. (This isn't terribly bulletproof, + * since someone might misuse an on_proc_exit handler for shmem cleanup, + * but it's a cheap and helpful check. We cannot disallow on_proc_exit + * handlers unfortunately, since pq_init() already registered one.) + */ + check_on_shmem_exit_lists_are_empty(); + /* * Stop here if it was bad or a cancel packet. ProcessStartupPacket * already did any appropriate error reporting. @@ -5369,23 +5369,21 @@ sigusr1_handler(SIGNAL_ARGS) /* * SIGTERM while processing startup packet. - * Clean up and exit(1). * - * Running proc_exit() from a signal handler is pretty unsafe, since we - * can't know what code we've interrupted. But the alternative of using - * _exit(2) is also unpalatable, since it'd mean that a "fast shutdown" - * would cause a database crash cycle (forcing WAL replay at restart) - * if any sessions are in authentication. So we live with it for now. + * Running proc_exit() from a signal handler would be quite unsafe. + * However, since we have not yet touched shared memory, we can just + * pull the plug and exit without running any atexit handlers. * - * One might be tempted to try to send a message indicating why we are - * disconnecting. However, that would make this even more unsafe. Also, - * it seems undesirable to provide clues about the database's state to - * a client that has not yet completed authentication. + * One might be tempted to try to send a message, or log one, indicating + * why we are disconnecting. However, that would be quite unsafe in itself. + * Also, it seems undesirable to provide clues about the database's state + * to a client that has not yet completed authentication, or even sent us + * a startup packet. */ static void process_startup_packet_die(SIGNAL_ARGS) { - proc_exit(1); + _exit(1); } /* @@ -5404,16 +5402,12 @@ dummy_handler(SIGNAL_ARGS) /* * Timeout while processing startup packet. - * As for process_startup_packet_die(), we clean up and exit(1). - * - * This is theoretically just as hazardous as in process_startup_packet_die(), - * although in practice we're almost certainly waiting for client input, - * which greatly reduces the risk. + * As for process_startup_packet_die(), we exit via _exit(1). */ static void StartupPacketTimeoutHandler(void) { - proc_exit(1); + _exit(1); } diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c index 11c3f132a1..36a067c924 100644 --- a/src/backend/storage/ipc/ipc.c +++ b/src/backend/storage/ipc/ipc.c @@ -416,3 +416,20 @@ on_exit_reset(void) on_proc_exit_index = 0; reset_on_dsm_detach(); } + +/* ---------------------------------------------------------------- + * check_on_shmem_exit_lists_are_empty + * + * Debugging check that no shmem cleanup handlers have been registered + * prematurely in the current process. + * ---------------------------------------------------------------- + */ +void +check_on_shmem_exit_lists_are_empty(void) +{ + if (before_shmem_exit_index) + elog(FATAL, "before_shmem_exit has been called prematurely"); + if (on_shmem_exit_index) + elog(FATAL, "on_shmem_exit has been called prematurely"); + /* Checking DSM detach state seems unnecessary given the above */ +} diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h index 462fe46341..88994fdc26 100644 --- a/src/include/storage/ipc.h +++ b/src/include/storage/ipc.h @@ -72,6 +72,7 @@ extern void on_shmem_exit(pg_on_exit_callback function, Datum arg); extern void before_shmem_exit(pg_on_exit_callback function, Datum arg); extern void cancel_before_shmem_exit(pg_on_exit_callback function, Datum arg); extern void on_exit_reset(void); +extern void check_on_shmem_exit_lists_are_empty(void); /* ipci.c */ extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
I wrote: > I looked briefly at the idea of postponing ProcessStartupPacket > until InitPostgres has set up a fairly normal environment. It > seems like it'd take a fair amount of refactoring. I really > doubt it's worth the effort, even though the result would be > arguably cleaner logic than what we have now. I felt more ambitious this morning and decided to take a harder look at this idea. But I soon realized that there would be a concrete disadvantage to delaying ProcessStartupPacket: until we have done that, we don't have the correct value for FrontendProtocol so there is a problem with reporting startup-time failures to the client. At least some such failures, such as "too many clients already", are pretty routine so we don't want to downgrade their user-friendliness. If memory serves, libpq has some ability to cope with a v2 error message even when it's expecting v3. But I wouldn't bet on that being true of all client libraries, and anyway it's a very under-tested code path. So I think we'd better go with the simple fix I showed before. It's simple enough that maybe we could back-patch it, once it's aged awhile in HEAD. OTOH, given the lack of field reports of trouble here, I'm not sure back-patching is worth the risk. regards, tom lane
Hi, On 2020-09-11 11:52:55 -0400, Tom Lane wrote: > It's simple enough that maybe we could back-patch it, once it's > aged awhile in HEAD. OTOH, given the lack of field reports of > trouble here, I'm not sure back-patching is worth the risk. FWIW, looking at collected stack traces in azure, there's a slow but steady stream of crashes below StartupPacketTimeoutHandler. Most seem to be things like libcrypto->malloc->StartupPacketTimeoutHandler->proc_exit->socket_close->free->crash there's a few other variants, some where the stack apparently was not decipherable for the relevant tooling. Note that this wouldn't even include cases where this caused hangs - which is quite common IME. Unsurprisingly just in versions before 14, where this change went in. I think that might be enough evidence for backpatching the commit? I've not heard of issues due to the checks in check_on_shmem_exit_lists_are_empty(). Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2020-09-11 11:52:55 -0400, Tom Lane wrote: >> It's simple enough that maybe we could back-patch it, once it's >> aged awhile in HEAD. OTOH, given the lack of field reports of >> trouble here, I'm not sure back-patching is worth the risk. > FWIW, looking at collected stack traces in azure, there's a slow but steady > stream of crashes below StartupPacketTimeoutHandler. ... > Unsurprisingly just in versions before 14, where this change went in. > I think that might be enough evidence for backpatching the commit? I've not > heard of issues due to the checks in check_on_shmem_exit_lists_are_empty(). I'd be willing to take a look at this in a few weeks when $real_life is a bit less demanding. Right before minor releases would likely be a bad idea anyway. regards, tom lane
Hi, On 2023-08-02 12:35:19 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2020-09-11 11:52:55 -0400, Tom Lane wrote: > >> It's simple enough that maybe we could back-patch it, once it's > >> aged awhile in HEAD. OTOH, given the lack of field reports of > >> trouble here, I'm not sure back-patching is worth the risk. > > > FWIW, looking at collected stack traces in azure, there's a slow but steady > > stream of crashes below StartupPacketTimeoutHandler. ... > > Unsurprisingly just in versions before 14, where this change went in. > > I think that might be enough evidence for backpatching the commit? I've not > > heard of issues due to the checks in check_on_shmem_exit_lists_are_empty(). > > I'd be willing to take a look at this in a few weeks when $real_life > is a bit less demanding. Cool. > Right before minor releases would likely be a bad idea anyway. Agreed. I had just waded through the stacks, so I thought it'd be worth bringing up, didn't intend to imply we should backpatch immediately. Aside: Analyzing this kind of thing at scale is made considerably more painful by "expected" ereport(PANIC)s (say out of disk space during WAL writes) calling abort() and dumping core... While there are other PANICs you really want cores of. Greetings, Andres Freund