Re: We shouldn't signal process groups with SIGQUIT - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: We shouldn't signal process groups with SIGQUIT |
Date | |
Msg-id | CA+hUKGJvK0Py8BJar+HVfPUUcERLCJpnYhztpRz6cKhq0svp+w@mail.gmail.com Whole thread Raw |
In response to | Re: We shouldn't signal process groups with SIGQUIT (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: We shouldn't signal process groups with SIGQUIT
|
List | pgsql-hackers |
On Tue, Feb 28, 2023 at 5:45 PM Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote: > > Just naively hacking this behaviour change into the current code, would yield > > sending SIGQUIT to postgres, and then SIGTERM to the whole process > > group. Which seems like a reasonable order? quickdie() should _exit() > > immediately in the signal handler, so we shouldn't get to processing the > > SIGTERM. Even if both signals are "reacted to" at the same time, possibly > > with SIGTERM being processed first, the SIGQUIT handler should be executed > > long before the next CFI(). > > I have been poking a bit at that, and did a change as simple as this > one in signal_child(): > #ifdef HAVE_SETSID > + if (signal == SIGQUIT) > + signal = SIGTERM; > > From what I can see, SIGTERM is actually received by the backends > before SIGQUIT, and I can also see that the backends have enough room > to process CFIs in some cases, especially short queries, even before > reaching quickdie() and its exit(). So the window between SIGTERM and > SIGQUIT is not as long as one would think. Pop quiz: in what order do signal handlers run, if SIGQUIT and SIGTERM are both pending when a process wakes up or unblocks? I *think* the answer on all typical implementation that follow conventions going back to ancient Unix (but not standardised, so you can't count on it!*), is that pending signals are delivered in order of the bits in the pending signals bitmap from lowest to highest, and SIGQUIT < SIGTERM (again: tradition, not standard), and then: 1. If the handlers block each other via their sa_mask so that they are serialised (note: ours don't) then you'll see the SIGQUIT handler run and then the SIGTERM handler, for example if you do kill(self, SIGTERM), kill(self, SIGQUIT), sigprocmask(SIG_SETMASK, &unblock_all, NULL). 2. If the handlers don't block each other (our case), then their stack frames will be set up in that order (you might say they start in that order but are immediately interrupted by the next one before they can do anything), so they then run in the reverse order, SIGTERM first. I guess that is what you saw? In theory you could straighten this out by asking what else is pending so that we imposed our own priority, if that were a problem, but there is something I don't understand: you said we could handle SIGTERM and then make it all the way to CFI() (= non-signal handler code) before handling a SIGQUIT that was sent first. Huh... what am I missing? I thought the only risk was handlers running in the opposite of send order because they 'overlapped', not non-handler code being allowed to run in between. *The standard explicitly says that delivery order is unspecified, except for realtime signals which are aren't using.
pgsql-hackers by date: