Re: SIGQUIT handling, redux - Mailing list pgsql-hackers

From Tom Lane
Subject Re: SIGQUIT handling, redux
Date
Msg-id 148145.1599703626@sss.pgh.pa.us
Whole thread Raw
In response to SIGQUIT handling, redux  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses RE: SIGQUIT handling, redux
Re: SIGQUIT handling, redux
List pgsql-hackers
I wrote:
> Of course, this is only safe if the SIGQUIT handler is safe to be invoked
> anywhere, so I did a quick survey of backend signal handlers to see if
> that is true.

This is straying a bit from the stated topic of this thread, but ...
I did some further looking around to see whether there were any
unsafe signal handlers besides SIGQUIT ones.  The situation is not
too awful, but I did find several issues not already mentioned
in this thread:

StartupProcShutdownHandler (SIGTERM)

This conditionally calls proc_exit(1).  The conditions boil down
to are-we-interrupting-a-system(3)-call, so who knows how safe
that is?  I wouldn't care to bet that system() doesn't use malloc,
for instance.  Still, the odds are very good that if a signal did
arrive, it'd be interrupting system()'s waitpid() or equivalent
kernel call, which is likely safe enough.

bgworker_die (SIGTERM)

Calls ereport(FATAL).  This is surely not any safer than, say,
quickdie().  No, it's worse, because at least that won't try
to go out via proc_exit().

FloatExceptionHandler (SIGFPE)

Calls ereport(ERROR).  This might be okay, though, since the
trap should be synchronous with the offending calculation.
Besides, if you're risking divide by zero or the like in
critical code, You're Doing It Wrong.

RecoveryConflictInterrupt (called from SIGUSR1)

Calls a whole boatload of state tests that were never designed
to be interrupt-safe, such as transaction-state-related inquiries
in xact.c.  The lack of any explicit awareness in this code that
it's in a signal handler doesn't discourage people from inserting
even more dubious stuff.  I think this needs to be burned to the
ground and rewritten.

StandbyDeadLockHandler (from SIGALRM)
StandbyTimeoutHandler (ditto)

Calls CancelDBBackends, which just for starters tries to acquire
an LWLock.  I think the only reason we've gotten away with this
for this long is the high probability that by the time either
timeout fires, we're going to be blocked on a semaphore.

I don't have any ideas about how to fix any of these things,
but I thought getting the problems on the record would be good.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: please update ps display for recovery checkpoint
Next
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: extension patch of CREATE OR REPLACE TRIGGER