Re: SIGQUIT handling, redux - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: SIGQUIT handling, redux
Date
Msg-id 20200909143900.GQ29590@tamriel.snowman.net
Whole thread Raw
In response to SIGQUIT handling, redux  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: SIGQUIT handling, redux  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> This is to pull together a couple of recent threads that are seemingly
> unrelated to $SUBJECT.
>
> Over in the long thread at [1] we've agreed to try to make the backend
> code actually do what assorted comments claim it does, namely allow
> SIGQUIT to be accepted at any point after a given process has established
> its signal handlers.  (Right now, it mostly is not accepted during error
> recovery, but there's no clear reason to preserve that exception.)
>
> Of course, this is only safe if the SIGQUIT handler is safe to be invoked
> anywhere, so I did a quick survey of backend signal handlers to see if
> that is true.  I immediately found pgarch_exit() which surely is not.  It
> turns out that Horiguchi-san already noticed that and proposed to fix it,
> within the seemingly entirely unrelated patch series for a shared-memory
> based stats collector (see patch 0001 in [2]).  I think we should go ahead
> and commit that, and maybe even back-patch it.  There seems no good reason
> for the archiver to treat SIGQUIT as nonfatal when other postmaster
> children do not.

As I mentioned over there, I agree that we should do this and we should
further have the statistics collector also do so, which currently sets
up SIGQUIT with ShutdownRequestPending() and in its loop decides it's
fine to write out the stats file (which we're going to remove during
recovery anyway...) and then call exit(0).  I also think we should
back-patch these changes, as the commit mentioned in Horiguchi-san's
patch for pgarch_exit() was.

> Another thing I found is startup_die() in postmaster.c, which catches
> SIGQUIT during the authentication phase of a new backend.  This calls
> proc_exit(1), which (a) is unsafe on its face, and (b) potentially
> demotes a SIGQUIT into something that will not cause the parent postmaster
> to perform a DB-wide restart.  There seems no good reason for that
> behavior, either.  I propose that we use SignalHandlerForCrashExit
> for SIGQUIT here too.

Yikes, agreed.

> In passing, it's worth noting that startup_die() isn't really much safer
> for SIGTERM than it is for SIGQUIT; the only argument for distinguishing
> those is that code that applies BlockSig will at least manage to block the
> former.  So it's slightly tempting to drop startup_die() altogether in
> favor of using SignalHandlerForCrashExit for the SIGTERM-during-auth case
> too.  However, that'd have the effect of causing a "fast shutdown" to get
> converted to a panic if it happens while any sessions are in
> authentication phase, so I think this cure is likely worse than the
> disease.

Agreed, that's definitely no good.

> Worth reading in connection with this is the thread that led up to
> commits 8e19a8264 et al [3].  We had a long discussion about how
> quickdie() and startup_die() might be made safe(r), without any
> real consensus emerging about any alternatives being better than the
> status quo.  I still don't have an improvement idea for quickdie();
> I don't want to give up trying to send a message to the client.
> Note, however, that quickdie() does end with _exit(2) so that at
> least it's not trying to execute arbitrary process-cleanup code.
>
> In short then, I want to ensure that postmaster children's SIGQUIT
> handlers always end with _exit(2) rather than some other exit method.
> We have two exceptions now and they need to get fixed.

I agree we should fix these.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: doc review for v13
Next
From: Stephen Frost
Date:
Subject: Re: shared-memory based stats collector