Thread: We shouldn't signal process groups with SIGQUIT

We shouldn't signal process groups with SIGQUIT

From
Andres Freund
Date:
Hi,

The default reaction to SIGQUIT is to create core dumps. We use SIGQUIT to
implement immediate shutdowns. We send the signal to the entire process group.

The result of that is that we regularly produce core dumps for binaries like
sh/cp. I regularly see this on my local system, I've seen it on CI. Recently
Thomas added logic to show core dumps happing in cfbot ([1]). Plenty unrelated
core dumps, but also lots in sh/cp ([2]).

We found a bunch of issues as part of [3], but I think the issue I'm
discussing here is separate.


ISTM that signal_child() should downgrade SIGQUIT to SIGTERM when sending to
the process group. That way we'd maintain the current behaviour for postgres
itself, but stop core-dumping archive/restore scripts (as well as other
subprocesses that e.g. trusted PLs might create).


Makes sense?


Greetings,

Andres Freund


[1] http://cfbot.cputube.org/highlights/core.html

[2] A small sample:
https://api.cirrus-ci.com/v1/task/5939902693507072/logs/cores.log
https://api.cirrus-ci.com/v1/task/5549174150660096/logs/cores.log
https://api.cirrus-ci.com/v1/task/6153817767542784/logs/cores.log
https://api.cirrus-ci.com/v1/task/6567335205535744/logs/cores.log
https://api.cirrus-ci.com/v1/task/4804998119292928/logs/cores.log

[3] https://postgr.es/m/Y9nGDSgIm83FHcad%40paquier.xyz



Re: We shouldn't signal process groups with SIGQUIT

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> ISTM that signal_child() should downgrade SIGQUIT to SIGTERM when sending to
> the process group. That way we'd maintain the current behaviour for postgres
> itself, but stop core-dumping archive/restore scripts (as well as other
> subprocesses that e.g. trusted PLs might create).

Yeah, I had been thinking along the same lines.  One issue
is that that means the backend itself will get SIGQUIT and SIGTERM
in close succession.  We need to make sure that that won't cause
problems.  It might be prudent to think about what order to send
the two signals in.

            regards, tom lane



Re: We shouldn't signal process groups with SIGQUIT

From
Andres Freund
Date:
Hi,

On 2023-02-14 15:38:24 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > ISTM that signal_child() should downgrade SIGQUIT to SIGTERM when sending to
> > the process group. That way we'd maintain the current behaviour for postgres
> > itself, but stop core-dumping archive/restore scripts (as well as other
> > subprocesses that e.g. trusted PLs might create).
> 
> Yeah, I had been thinking along the same lines.  One issue
> is that that means the backend itself will get SIGQUIT and SIGTERM
> in close succession.  We need to make sure that that won't cause
> problems.  It might be prudent to think about what order to send
> the two signals in.

I hope we already deal with that reasonably well - I think it's not uncommon
for that to happen, regardless of this change.

Just naively hacking this behaviour change into the current code, would yield
sending SIGQUIT to postgres, and then SIGTERM to the whole process
group. Which seems like a reasonable order?  quickdie() should _exit()
immediately in the signal handler, so we shouldn't get to processing the
SIGTERM.  Even if both signals are "reacted to" at the same time, possibly
with SIGTERM being processed first, the SIGQUIT handler should be executed
long before the next CFI().


Not really related: I do wonder how often we end up self deadlocking in
quickdie(), due to the ereport() not beeing reentrant. We'll "fix" it soon
after, due to postmasters SIGKILL.  Perhaps we should turn on
send_abort_for_kill on CI?

Greetings,

Andres Freund



Re: We shouldn't signal process groups with SIGQUIT

From
Nathan Bossart
Date:
On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
> Not really related: I do wonder how often we end up self deadlocking in
> quickdie(), due to the ereport() not beeing reentrant. We'll "fix" it soon
> after, due to postmasters SIGKILL.  Perhaps we should turn on
> send_abort_for_kill on CI?

+1, this seems like it'd be useful in general.  I'm guessing this will
require a bit of work on the CI side to generate the backtrace.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: We shouldn't signal process groups with SIGQUIT

From
Andres Freund
Date:
Hi,

On 2023-02-14 14:23:32 -0800, Nathan Bossart wrote:
> On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
> > Not really related: I do wonder how often we end up self deadlocking in
> > quickdie(), due to the ereport() not beeing reentrant. We'll "fix" it soon
> > after, due to postmasters SIGKILL.  Perhaps we should turn on
> > send_abort_for_kill on CI?
> 
> +1, this seems like it'd be useful in general.  I'm guessing this will
> require a bit of work on the CI side to generate the backtrace.

They're already generated for all current platforms.  It's possible that debug
info for some system libraries is missing, but the most important one (like
libc) should be available.

Since yesterday the cfbot website allows to easily find the coredumps, too:
http://cfbot.cputube.org/highlights/core-7.html

There's definitely some work to be done to make the contents of the backtraces
more useful though.  E.g. printing out the program name, the current directory
(although that doesn't seem to be doable for all programs). For everything but
windows that's in src/tools/ci/cores_backtrace.sh.

Greetings,

Andres Freund



Re: We shouldn't signal process groups with SIGQUIT

From
Nathan Bossart
Date:
On Tue, Feb 14, 2023 at 04:20:59PM -0800, Andres Freund wrote:
> On 2023-02-14 14:23:32 -0800, Nathan Bossart wrote:
>> On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
>> > Not really related: I do wonder how often we end up self deadlocking in
>> > quickdie(), due to the ereport() not beeing reentrant. We'll "fix" it soon
>> > after, due to postmasters SIGKILL.  Perhaps we should turn on
>> > send_abort_for_kill on CI?
>> 
>> +1, this seems like it'd be useful in general.  I'm guessing this will
>> require a bit of work on the CI side to generate the backtrace.
> 
> They're already generated for all current platforms.  It's possible that debug
> info for some system libraries is missing, but the most important one (like
> libc) should be available.
> 
> Since yesterday the cfbot website allows to easily find the coredumps, too:
> http://cfbot.cputube.org/highlights/core-7.html

Oh, that's nifty.  Any reason not to enable send_abort_for_crash, too?

> There's definitely some work to be done to make the contents of the backtraces
> more useful though.  E.g. printing out the program name, the current directory
> (although that doesn't seem to be doable for all programs). For everything but
> windows that's in src/tools/ci/cores_backtrace.sh.

Got it, thanks for the info.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: We shouldn't signal process groups with SIGQUIT

From
Andres Freund
Date:
Hi,

On 2023-02-15 09:57:41 -0800, Nathan Bossart wrote:
> On Tue, Feb 14, 2023 at 04:20:59PM -0800, Andres Freund wrote:
> > On 2023-02-14 14:23:32 -0800, Nathan Bossart wrote:
> >> On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
> >> > Not really related: I do wonder how often we end up self deadlocking in
> >> > quickdie(), due to the ereport() not beeing reentrant. We'll "fix" it soon
> >> > after, due to postmasters SIGKILL.  Perhaps we should turn on
> >> > send_abort_for_kill on CI?
> >> 
> >> +1, this seems like it'd be useful in general.  I'm guessing this will
> >> require a bit of work on the CI side to generate the backtrace.
> > 
> > They're already generated for all current platforms.  It's possible that debug
> > info for some system libraries is missing, but the most important one (like
> > libc) should be available.
> > 
> > Since yesterday the cfbot website allows to easily find the coredumps, too:
> > http://cfbot.cputube.org/highlights/core-7.html
> 
> Oh, that's nifty.  Any reason not to enable send_abort_for_crash, too?

I think it'd be too noisy. Right now you get just a core dump of the crashed
process, but if we set send_abort_for_crash we'd end up with a lot of core
dumps, making it harder to know what to look at.

We should never need the send_abort_for_kill path, so I don't think the noise
issue applies to the same degree.

Greetings,

Andres



Re: We shouldn't signal process groups with SIGQUIT

From
Nathan Bossart
Date:
On Wed, Feb 15, 2023 at 10:12:58AM -0800, Andres Freund wrote:
> On 2023-02-15 09:57:41 -0800, Nathan Bossart wrote:
>> Oh, that's nifty.  Any reason not to enable send_abort_for_crash, too?
> 
> I think it'd be too noisy. Right now you get just a core dump of the crashed
> process, but if we set send_abort_for_crash we'd end up with a lot of core
> dumps, making it harder to know what to look at.
> 
> We should never need the send_abort_for_kill path, so I don't think the noise
> issue applies to the same degree.

Makes sense.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: We shouldn't signal process groups with SIGQUIT

From
Michael Paquier
Date:
On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
> Just naively hacking this behaviour change into the current code, would yield
> sending SIGQUIT to postgres, and then SIGTERM to the whole process
> group. Which seems like a reasonable order?  quickdie() should _exit()
> immediately in the signal handler, so we shouldn't get to processing the
> SIGTERM.  Even if both signals are "reacted to" at the same time, possibly
> with SIGTERM being processed first, the SIGQUIT handler should be executed
> long before the next CFI().

I can see the sense in this argument and this order should work, still
adding more complication for the backends does not sound that
appealing to me.

What would be the advantage of doing that for groups other than
-StartupPID and -PgArchPID?  These are the two groups of processes we
need to worry about, AFAIK.
--
Michael

Attachment

Re: We shouldn't signal process groups with SIGQUIT

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> What would be the advantage of doing that for groups other than
> -StartupPID and -PgArchPID?  These are the two groups of processes we
> need to worry about, AFAIK.

No, we have the issue for regular backends too, since they could be
executing COPY FROM PROGRAM or the like (not to mention that functions
in plperlu, plpythonu, etc could spawn child processes).

            regards, tom lane



Re: We shouldn't signal process groups with SIGQUIT

From
Michael Paquier
Date:
On Wed, Feb 22, 2023 at 09:39:55AM -0500, Tom Lane wrote:
> Michael Paquier <michael@paquier.xyz> writes:
>> What would be the advantage of doing that for groups other than
>> -StartupPID and -PgArchPID?  These are the two groups of processes we
>> need to worry about, AFAIK.
>
> No, we have the issue for regular backends too, since they could be
> executing COPY FROM PROGRAM or the like (not to mention that functions
> in plperlu, plpythonu, etc could spawn child processes).

Indeed, right.  I completely forgot about these cases.
--
Michael

Attachment

Re: We shouldn't signal process groups with SIGQUIT

From
Michael Paquier
Date:
On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
> Just naively hacking this behaviour change into the current code, would yield
> sending SIGQUIT to postgres, and then SIGTERM to the whole process
> group. Which seems like a reasonable order?  quickdie() should _exit()
> immediately in the signal handler, so we shouldn't get to processing the
> SIGTERM.  Even if both signals are "reacted to" at the same time, possibly
> with SIGTERM being processed first, the SIGQUIT handler should be executed
> long before the next CFI().

I have been poking a bit at that, and did a change as simple as this
one in signal_child():
 #ifdef HAVE_SETSID
+   if (signal == SIGQUIT)
+       signal = SIGTERM;

From what I can see, SIGTERM is actually received by the backends
before SIGQUIT, and I can also see that the backends have enough room
to process CFIs in some cases, especially short queries, even before
reaching quickdie() and its exit().  So the window between SIGTERM and
SIGQUIT is not as long as one would think.
--
Michael

Attachment

Re: We shouldn't signal process groups with SIGQUIT

From
Thomas Munro
Date:
On Tue, Feb 28, 2023 at 5:45 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
> > Just naively hacking this behaviour change into the current code, would yield
> > sending SIGQUIT to postgres, and then SIGTERM to the whole process
> > group. Which seems like a reasonable order?  quickdie() should _exit()
> > immediately in the signal handler, so we shouldn't get to processing the
> > SIGTERM.  Even if both signals are "reacted to" at the same time, possibly
> > with SIGTERM being processed first, the SIGQUIT handler should be executed
> > long before the next CFI().
>
> I have been poking a bit at that, and did a change as simple as this
> one in signal_child():
>  #ifdef HAVE_SETSID
> +   if (signal == SIGQUIT)
> +       signal = SIGTERM;
>
> From what I can see, SIGTERM is actually received by the backends
> before SIGQUIT, and I can also see that the backends have enough room
> to process CFIs in some cases, especially short queries, even before
> reaching quickdie() and its exit().  So the window between SIGTERM and
> SIGQUIT is not as long as one would think.

Pop quiz: in what order do signal handlers run, if SIGQUIT and SIGTERM
are both pending when a process wakes up or unblocks?  I *think* the
answer on all typical implementation that follow conventions going
back to ancient Unix (but not standardised, so you can't count on
it!*), is that pending signals are delivered in order of the bits in
the pending signals bitmap from lowest to highest, and SIGQUIT <
SIGTERM (again: tradition, not standard), and then:

1.  If the handlers block each other via their sa_mask so that they
are serialised (note: ours don't) then you'll see the SIGQUIT handler
run and then the SIGTERM handler, for example if you do kill(self,
SIGTERM), kill(self, SIGQUIT), sigprocmask(SIG_SETMASK, &unblock_all,
NULL).

2.  If the handlers don't block each other (our case), then their
stack frames will be set up in that order (you might say they start in
that order but are immediately interrupted by the next one before they
can do anything), so they then run in the reverse order, SIGTERM
first.  I guess that is what you saw?

In theory you could straighten this out by asking what else is pending
so that we imposed our own priority, if that were a problem, but there
is something I don't understand: you said we could handle SIGTERM and
then make it all the way to CFI() (= non-signal handler code) before
handling a SIGQUIT that was sent first.  Huh... what am I missing?  I
thought the only risk was handlers running in the opposite of send
order because they 'overlapped', not non-handler code being allowed to
run in between.

*The standard explicitly says that delivery order is unspecified,
except for realtime signals which are aren't using.



Re: We shouldn't signal process groups with SIGQUIT

From
Andres Freund
Date:
Hi,

On 2023-02-28 13:45:41 +0900, Michael Paquier wrote:
> On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
> > Just naively hacking this behaviour change into the current code, would yield
> > sending SIGQUIT to postgres, and then SIGTERM to the whole process
> > group. Which seems like a reasonable order?  quickdie() should _exit()
> > immediately in the signal handler, so we shouldn't get to processing the
> > SIGTERM.  Even if both signals are "reacted to" at the same time, possibly
> > with SIGTERM being processed first, the SIGQUIT handler should be executed
> > long before the next CFI().
> 
> I have been poking a bit at that, and did a change as simple as this
> one in signal_child():
>  #ifdef HAVE_SETSID
> +   if (signal == SIGQUIT)
> +       signal = SIGTERM;

FWIW, one thing that kept me from actually proposing a patch is that I thought
it might be useful to write a test for this, but that I didn't yet have the
cycles to look into that.


> From what I can see, SIGTERM is actually received by the backends
> before SIGQUIT, and I can also see that the backends have enough room
> to process CFIs in some cases, especially short queries, even before 
> reaching quickdie() and its exit().  So the window between SIGTERM and
> SIGQUIT is not as long as one would think.

What do you mean with the last ssentence? Why would one think that the window
between them is long? Do you mean that it's not as short?

Greetings,

Andres Freund



Re: We shouldn't signal process groups with SIGQUIT

From
Andres Freund
Date:
Hi,

On 2023-03-02 12:29:28 +1300, Thomas Munro wrote:
> In theory you could straighten this out by asking what else is pending
> so that we imposed our own priority, if that were a problem, but there
> is something I don't understand: you said we could handle SIGTERM and
> then make it all the way to CFI() (= non-signal handler code) before
> handling a SIGQUIT that was sent first.  Huh... what am I missing?  I
> thought the only risk was handlers running in the opposite of send
> order because they 'overlapped', not non-handler code being allowed to
> run in between.

I see ProcessInterrupts() being called too - but it's independent of the
changes we discuss here.  The reason for it is the CFI() at the end of
errfinish().

Note that ProcessInterrupts() immediately returns, due to the
HOLD_INTERRUPTS() at the start of quickdie().

FWIW, here's the strace output of a backend, enriched with a few debug
write()s.

epoll_wait(5, 0x55b25764fd70, 1, -1)    = -1 EINTR (Interrupted system call)
--- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=759211, si_uid=1000} ---
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=759211, si_uid=1000} ---
write(2, "start die\n", 10)             = 10
kill(759218, SIGURG)                    = 0
write(2, "end die\n", 8)                = 8
rt_sigreturn({mask=[QUIT URG]})         = 0
write(2, "start quickdie\n", 15)        = 15
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0
sendto(10, "N\0\0\0tSWARNING\0VWARNING\0C57P01\0Mt"..., 117, 0, NULL, 0) = 117
write(2, "ProcessInterrupts\n", 18)     = 18
write(2, "ProcessInterrupts held off\n", 27) = 27
write(2, "end quickdie\n", 13)          = 13
exit_group(2)                           = ?
+++ exited with 2 +++


We do way too many non-signal safe things in quickdie(). But I'm not sure what
the alternative is, given we probably do want to send something to the client.

Greetings,

Andres Freund



Re: We shouldn't signal process groups with SIGQUIT

From
Michael Paquier
Date:
On Wed, Mar 01, 2023 at 03:34:30PM -0800, Andres Freund wrote:
> On 2023-02-28 13:45:41 +0900, Michael Paquier wrote:
>> From what I can see, SIGTERM is actually received by the backends
>> before SIGQUIT, and I can also see that the backends have enough room
>> to process CFIs in some cases, especially short queries, even before
>> reaching quickdie() and its exit().  So the window between SIGTERM and
>> SIGQUIT is not as long as one would think.
>
> What do you mean with the last ssentence? Why would one think that the window
> between them is long? Do you mean that it's not as short?

That should have been worded as "short".  In what I looked at, both
signal handlers are processed in the same millisecond, still the
backend can have time to process a full CFI between the SIGTERM and
SIGQUIT handlers, before the SIGQUIT handler has the time to exit().
--
Michael

Attachment

Re: We shouldn't signal process groups with SIGQUIT

From
Thomas Munro
Date:
On Thu, Mar 2, 2023 at 1:09 PM Andres Freund <andres@anarazel.de> wrote:
> On 2023-03-02 12:29:28 +1300, Thomas Munro wrote:
> > ... Huh... what am I missing?  I
> > thought the only risk was handlers running in the opposite of send
> > order because they 'overlapped', not non-handler code being allowed to
> > run in between.
>
> I see ProcessInterrupts() being called too - but it's independent of the
> changes we discuss here.  The reason for it is the CFI() at the end of
> errfinish().

Ahh, right, I see.