Thread: What (not) to do in signal handlers

What (not) to do in signal handlers

From

Peter Eisentraut

Date:

14 June 2001, 13:40:37

I notice that the signal handlers in postmaster.c do quite a lot of work,
much more than what they teach you in school they should do.  While
fprintf, elog, and ctime may simply lead to annoyances, forking off the
WAL helper processes seems to be quite a lot.

ISTM that most of these, esp. pmdie(), can be written more like the SIGHUP
handler, i.e., set a global variable and evaluate right after the
select().  This would at least give me a better feeling when I send "Fast
Shutdown request at %s" etc. through elog(), which is what they should do
for consistent message formatting.

Comments?

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter

Re: What (not) to do in signal handlers

From

Bruce Momjian

Date:

14 June 2001, 13:51:30

> I notice that the signal handlers in postmaster.c do quite a lot of work,
> much more than what they teach you in school they should do.  While
> fprintf, elog, and ctime may simply lead to annoyances, forking off the
> WAL helper processes seems to be quite a lot.
> 
> ISTM that most of these, esp. pmdie(), can be written more like the SIGHUP
> handler, i.e., set a global variable and evaluate right after the
> select().  This would at least give me a better feeling when I send "Fast
> Shutdown request at %s" etc. through elog(), which is what they should do
> for consistent message formatting.

Agreed.  If we don't loop around to check the variable soon we have
bigger problems that the signal handlers.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: What (not) to do in signal handlers

From

Tom Lane

Date:

14 June 2001, 14:58:56

Peter Eisentraut <peter_e@gmx.net> writes:
> I notice that the signal handlers in postmaster.c do quite a lot of work,
> much more than what they teach you in school they should do.

Yes, they're pretty ugly.  However, we have not recently heard any
complaints suggesting problems with it.  Since we block signals
everywhere except just around the select() for new input, there's not
really any risk of recursive resource use AFAICS.

> ISTM that most of these, esp. pmdie(), can be written more like the SIGHUP
> handler, i.e., set a global variable and evaluate right after the
> select().

I would love to see it done that way, *if* you can show me a way to
guarantee that the signal response will happen promptly.  AFAIK there's
no portable way to ensure that we don't end up sitting and waiting for a
new client message before we get past the select().  If we could release
the signal mask just as an atomic part of the select operation, then
it'd work, but only some platforms support that.  This has been
discussed before, look in the archives.

SIGHUP is okay because we don't really care whether the postmaster
rereads the config file right away or only when it's about to process a
new request, anyhow.  But for other sorts of signals we need to be sure
that there can't be an indefinite delay before the signal is acted on.
        regards, tom lane

Re: What (not) to do in signal handlers

From

ncm@zembu.com (Nathan Myers)

Date:

14 June 2001, 15:57:20

On Thu, Jun 14, 2001 at 02:18:40PM -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > I notice that the signal handlers in postmaster.c do quite a lot of work,
> > much more than what they teach you in school they should do.
> 
> Yes, they're pretty ugly.  However, we have not recently heard any
> complaints suggesting problems with it.  Since we block signals
> everywhere except just around the select() for new input, there's not
> really any risk of recursive resource use AFAICS.
> 
> > ISTM that most of these, esp. pmdie(), can be written more like the SIGHUP
> > handler, i.e., set a global variable and evaluate right after the
> > select().
> 
> I would love to see it done that way, *if* you can show me a way to
> guarantee that the signal response will happen promptly.  AFAIK there's
> no portable way to ensure that we don't end up sitting and waiting for a
> new client message before we get past the select().  

It could open a pipe, and write(2) a byte to it in the signal handler, 
and then have select(2) watch that pipe.  (SIGHUP could use the same pipe.)
Writing to and reading from your own pipe can be a recipe for deadlock, 
but here it would be safe if the signal handler knows not to get too far
ahead of select.  (The easy way would be to allow no more than one byte
in the pipe per signal handler.)

Of course this is still a system call in a signal handler, but it can't
(modulo coding bugs) fail.  See Stevens, "Unix Network Programming, 
Vol. 2, Interprocess Communication", p. 91, Figure 5.10, "Functions 
that are async-signal-safe".  The figure lists write() among others.
Sample code implementing the above appears on page 94.  Examples using 
other techniques (sigwait, nonblocking mq_receive) are presented also.

A pipe per backend might be considered pretty expensive.  Does UNIX 
allocate a pipe buffer before there's anything to put in it?

Nathan Myers
ncm@zembu.com

Re: What (not) to do in signal handlers

From

Tom Lane

Date:

14 June 2001, 16:36:12

ncm@zembu.com (Nathan Myers) writes:
> It could open a pipe, and write(2) a byte to it in the signal handler, 
> and then have select(2) watch that pipe.  (SIGHUP could use the same pipe.)
> Of course this is still a system call in a signal handler, but it can't
> (modulo coding bugs) fail.

Hm.  That's one way, but is it really any cleaner than our existing
technique?  Since you still need to assume you can do a system call
in a signal handler, it doesn't seem like a real gain in
bulletproofness to me.

> A pipe per backend might be considered pretty expensive.

Pipe per postmaster, no?  That doesn't seem like a huge cost.  I'd be
more concerned about the two extra kernel calls (write and read) per
signal received, actually.
        regards, tom lane

Re: What (not) to do in signal handlers

From

Doug McNaught

Date:

14 June 2001, 16:58:58

Tom Lane <tgl@sss.pgh.pa.us> writes:

> ncm@zembu.com (Nathan Myers) writes:
> > It could open a pipe, and write(2) a byte to it in the signal handler, 
> > and then have select(2) watch that pipe.  (SIGHUP could use the same pipe.)
> > Of course this is still a system call in a signal handler, but it can't
> > (modulo coding bugs) fail.
> 
> Hm.  That's one way, but is it really any cleaner than our existing
> technique?  Since you still need to assume you can do a system call
> in a signal handler, it doesn't seem like a real gain in
> bulletproofness to me.

Doing write() in a signal handler is safe; doing fprintf() (and
friends) is not.  POSIX specifies async-signal-safe functions, of
which write() is one.  I'd be very surprised if any Unix that people
are using today had reentrancy problems with write().  All bets are
off with fprintf() and the other stdio functions--they are not
guaranteed async-signal-safe, and production code should never call
them in a signal handler.  Stevens is very thorough on this subject.

> > A pipe per backend might be considered pretty expensive.
> 
> Pipe per postmaster, no?  That doesn't seem like a huge cost.  I'd be
> more concerned about the two extra kernel calls (write and read) per
> signal received, actually.

That's definitely an issue, but signals are (generally) pretty rare
beasts, unless you're doing signal-driven I/O or some such.

-Doug
-- 
The rain man gave me two cures; he said jump right in,
The first was Texas medicine--the second was just railroad gin,
And like a fool I mixed them, and it strangled up my mind,
Now people just get uglier, and I got no sense of time...          --Dylan

Re: What (not) to do in signal handlers

From

Tom Lane

Date:

14 June 2001, 17:11:14

Doug McNaught <doug@wireboard.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Hm.  That's one way, but is it really any cleaner than our existing
>> technique?  Since you still need to assume you can do a system call
>> in a signal handler, it doesn't seem like a real gain in
>> bulletproofness to me.

> Doing write() in a signal handler is safe; doing fprintf() (and
> friends) is not.

If we were calling the signal handlers from random places, then I'd
agree.  But we're not: we use sigblock to ensure that signals are only
serviced at the place in the postmaster main loop where select() is
called.  So there's no actual risk of reentrant use of non-reentrant
library functions.

Please recall that in practice the postmaster is extremely reliable.
The single bug we have seen with the signal handlers in recent releases
was the problem that they were clobbering errno, which was easily fixed
by saving/restoring errno.  This same bug would have arisen (though at
such low probability we'd likely never have solved it) in a signal
handler that only invoked write().  So I find it difficult to buy the
argument that there's any net gain in robustness to be had here.

In short: this code isn't broken, and so I'm not convinced we should
"fix" it.
        regards, tom lane

Re: What (not) to do in signal handlers

From

ncm@zembu.com (Nathan Myers)

Date:

14 June 2001, 17:12:04

On Thu, Jun 14, 2001 at 04:27:14PM -0400, Tom Lane wrote:
> ncm@zembu.com (Nathan Myers) writes:
> > It could open a pipe, and write(2) a byte to it in the signal handler, 
> > and then have select(2) watch that pipe.  (SIGHUP could use the same pipe.)
> > Of course this is still a system call in a signal handler, but it can't
> > (modulo coding bugs) fail.
> 
> Hm.  That's one way, but is it really any cleaner than our existing
> technique?  Since you still need to assume you can do a system call
> in a signal handler, it doesn't seem like a real gain in
> bulletproofness to me.

Quoting Stevens (UNPv2, p. 90),
 Posix uses the term *async-signal-safe* to describe the functions that may be called from a signal handler.  Figure
5.10lists these Posix functions, along with a few that were added by Unix98.

 Functions not listed may not be called from a signal andler.  Note that none of the standard I/O functions ... are
listed. Of call the IPC functions covered in this text, only sem_post, read, and write are listed (we are assuming the
lattertwo would be used with pipes and FIFOs).

Restricting the handler to use those in the approved list seems like an 
automatic improvement to me, even in the apparent absence of evidence 
of problems on those platforms that happen to get tested most.  

> > A pipe per backend might be considered pretty expensive.
> 
> Pipe per postmaster, no?  That doesn't seem like a huge cost.  

I haven't looked at how complex the signal handling in the backends is;
maybe they don't need anything this fancy.  (OTOH, maybe they should be 
using a pipe to communicate with postmaster, instead of using signals.)

> I'd be
> more concerned about the two extra kernel calls (write and read) per
> signal received, actually.

Are there so many signals flying around?  The signal handler would check 
a flag before writing, so a storm of signals would result in only one 
call to write, and one call to read, per select loop.

Nathan Myers
ncm@zembu.com

Re: What (not) to do in signal handlers

From

Doug McNaught

Date:

14 June 2001, 17:32:41

Tom Lane <tgl@sss.pgh.pa.us> writes:

> In short: this code isn't broken, and so I'm not convinced we should
> "fix" it.

Point well taken.  I was basically arguing from general principles. ;)

-Doug
-- 
The rain man gave me two cures; he said jump right in,
The first was Texas medicine--the second was just railroad gin,
And like a fool I mixed them, and it strangled up my mind,
Now people just get uglier, and I got no sense of time...          --Dylan

Re: What (not) to do in signal handlers

From

ncm@zembu.com (Nathan Myers)

Date:

14 June 2001, 19:21:36

On Thu, Jun 14, 2001 at 05:10:58PM -0400, Tom Lane wrote:
> Doug McNaught <doug@wireboard.com> writes:
> > Tom Lane <tgl@sss.pgh.pa.us> writes:
> >> Hm.  That's one way, but is it really any cleaner than our existing
> >> technique?  Since you still need to assume you can do a system call
> >> in a signal handler, it doesn't seem like a real gain in
> >> bulletproofness to me.
> 
> > Doing write() in a signal handler is safe; doing fprintf() (and
> > friends) is not.
> 
> If we were calling the signal handlers from random places, then I'd
> agree.  But we're not: we use sigblock to ensure that signals are only
> serviced at the place in the postmaster main loop where select() is
> called.  So there's no actual risk of reentrant use of non-reentrant
> library functions.
> 
> Please recall that in practice the postmaster is extremely reliable.
> The single bug we have seen with the signal handlers in recent releases
> was the problem that they were clobbering errno, which was easily fixed
> by saving/restoring errno.  This same bug would have arisen (though at
> such low probability we'd likely never have solved it) in a signal
> handler that only invoked write().  So I find it difficult to buy the
> argument that there's any net gain in robustness to be had here.
> 
> In short: this code isn't broken, and so I'm not convinced we should
> "fix" it.
Formally speaking, it *is* broken: we depend on semantics that are
documented as unportable and undefined.  In a sense, we have been so 
unlucky as not to have perceived, thus far, the undefined effects.  

This is no different from depending on finding a NUL at *(char*)0, or 
on being able to say "free(p); p = p->next;".  Yes, it appears to work,
at the moment, on some platforms, but that doesn't make it correct.

It may not be terribly urgent to fix it right now, but that's far from
"isn't broken".  It at least merits a TODO entry.

Nathan Myers
ncm@zembu.com