Thread: What (not) to do in signal handlers
I notice that the signal handlers in postmaster.c do quite a lot of work, much more than what they teach you in school they should do. While fprintf, elog, and ctime may simply lead to annoyances, forking off the WAL helper processes seems to be quite a lot. ISTM that most of these, esp. pmdie(), can be written more like the SIGHUP handler, i.e., set a global variable and evaluate right after the select(). This would at least give me a better feeling when I send "Fast Shutdown request at %s" etc. through elog(), which is what they should do for consistent message formatting. Comments? -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
> I notice that the signal handlers in postmaster.c do quite a lot of work, > much more than what they teach you in school they should do. While > fprintf, elog, and ctime may simply lead to annoyances, forking off the > WAL helper processes seems to be quite a lot. > > ISTM that most of these, esp. pmdie(), can be written more like the SIGHUP > handler, i.e., set a global variable and evaluate right after the > select(). This would at least give me a better feeling when I send "Fast > Shutdown request at %s" etc. through elog(), which is what they should do > for consistent message formatting. Agreed. If we don't loop around to check the variable soon we have bigger problems that the signal handlers. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Peter Eisentraut <peter_e@gmx.net> writes: > I notice that the signal handlers in postmaster.c do quite a lot of work, > much more than what they teach you in school they should do. Yes, they're pretty ugly. However, we have not recently heard any complaints suggesting problems with it. Since we block signals everywhere except just around the select() for new input, there's not really any risk of recursive resource use AFAICS. > ISTM that most of these, esp. pmdie(), can be written more like the SIGHUP > handler, i.e., set a global variable and evaluate right after the > select(). I would love to see it done that way, *if* you can show me a way to guarantee that the signal response will happen promptly. AFAIK there's no portable way to ensure that we don't end up sitting and waiting for a new client message before we get past the select(). If we could release the signal mask just as an atomic part of the select operation, then it'd work, but only some platforms support that. This has been discussed before, look in the archives. SIGHUP is okay because we don't really care whether the postmaster rereads the config file right away or only when it's about to process a new request, anyhow. But for other sorts of signals we need to be sure that there can't be an indefinite delay before the signal is acted on. regards, tom lane
On Thu, Jun 14, 2001 at 02:18:40PM -0400, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > I notice that the signal handlers in postmaster.c do quite a lot of work, > > much more than what they teach you in school they should do. > > Yes, they're pretty ugly. However, we have not recently heard any > complaints suggesting problems with it. Since we block signals > everywhere except just around the select() for new input, there's not > really any risk of recursive resource use AFAICS. > > > ISTM that most of these, esp. pmdie(), can be written more like the SIGHUP > > handler, i.e., set a global variable and evaluate right after the > > select(). > > I would love to see it done that way, *if* you can show me a way to > guarantee that the signal response will happen promptly. AFAIK there's > no portable way to ensure that we don't end up sitting and waiting for a > new client message before we get past the select(). It could open a pipe, and write(2) a byte to it in the signal handler, and then have select(2) watch that pipe. (SIGHUP could use the same pipe.) Writing to and reading from your own pipe can be a recipe for deadlock, but here it would be safe if the signal handler knows not to get too far ahead of select. (The easy way would be to allow no more than one byte in the pipe per signal handler.) Of course this is still a system call in a signal handler, but it can't (modulo coding bugs) fail. See Stevens, "Unix Network Programming, Vol. 2, Interprocess Communication", p. 91, Figure 5.10, "Functions that are async-signal-safe". The figure lists write() among others. Sample code implementing the above appears on page 94. Examples using other techniques (sigwait, nonblocking mq_receive) are presented also. A pipe per backend might be considered pretty expensive. Does UNIX allocate a pipe buffer before there's anything to put in it? Nathan Myers ncm@zembu.com
ncm@zembu.com (Nathan Myers) writes: > It could open a pipe, and write(2) a byte to it in the signal handler, > and then have select(2) watch that pipe. (SIGHUP could use the same pipe.) > Of course this is still a system call in a signal handler, but it can't > (modulo coding bugs) fail. Hm. That's one way, but is it really any cleaner than our existing technique? Since you still need to assume you can do a system call in a signal handler, it doesn't seem like a real gain in bulletproofness to me. > A pipe per backend might be considered pretty expensive. Pipe per postmaster, no? That doesn't seem like a huge cost. I'd be more concerned about the two extra kernel calls (write and read) per signal received, actually. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > ncm@zembu.com (Nathan Myers) writes: > > It could open a pipe, and write(2) a byte to it in the signal handler, > > and then have select(2) watch that pipe. (SIGHUP could use the same pipe.) > > Of course this is still a system call in a signal handler, but it can't > > (modulo coding bugs) fail. > > Hm. That's one way, but is it really any cleaner than our existing > technique? Since you still need to assume you can do a system call > in a signal handler, it doesn't seem like a real gain in > bulletproofness to me. Doing write() in a signal handler is safe; doing fprintf() (and friends) is not. POSIX specifies async-signal-safe functions, of which write() is one. I'd be very surprised if any Unix that people are using today had reentrancy problems with write(). All bets are off with fprintf() and the other stdio functions--they are not guaranteed async-signal-safe, and production code should never call them in a signal handler. Stevens is very thorough on this subject. > > A pipe per backend might be considered pretty expensive. > > Pipe per postmaster, no? That doesn't seem like a huge cost. I'd be > more concerned about the two extra kernel calls (write and read) per > signal received, actually. That's definitely an issue, but signals are (generally) pretty rare beasts, unless you're doing signal-driven I/O or some such. -Doug -- The rain man gave me two cures; he said jump right in, The first was Texas medicine--the second was just railroad gin, And like a fool I mixed them, and it strangled up my mind, Now people just get uglier, and I got no sense of time... --Dylan
Doug McNaught <doug@wireboard.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> Hm. That's one way, but is it really any cleaner than our existing >> technique? Since you still need to assume you can do a system call >> in a signal handler, it doesn't seem like a real gain in >> bulletproofness to me. > Doing write() in a signal handler is safe; doing fprintf() (and > friends) is not. If we were calling the signal handlers from random places, then I'd agree. But we're not: we use sigblock to ensure that signals are only serviced at the place in the postmaster main loop where select() is called. So there's no actual risk of reentrant use of non-reentrant library functions. Please recall that in practice the postmaster is extremely reliable. The single bug we have seen with the signal handlers in recent releases was the problem that they were clobbering errno, which was easily fixed by saving/restoring errno. This same bug would have arisen (though at such low probability we'd likely never have solved it) in a signal handler that only invoked write(). So I find it difficult to buy the argument that there's any net gain in robustness to be had here. In short: this code isn't broken, and so I'm not convinced we should "fix" it. regards, tom lane
On Thu, Jun 14, 2001 at 04:27:14PM -0400, Tom Lane wrote: > ncm@zembu.com (Nathan Myers) writes: > > It could open a pipe, and write(2) a byte to it in the signal handler, > > and then have select(2) watch that pipe. (SIGHUP could use the same pipe.) > > Of course this is still a system call in a signal handler, but it can't > > (modulo coding bugs) fail. > > Hm. That's one way, but is it really any cleaner than our existing > technique? Since you still need to assume you can do a system call > in a signal handler, it doesn't seem like a real gain in > bulletproofness to me. Quoting Stevens (UNPv2, p. 90), Posix uses the term *async-signal-safe* to describe the functions that may be called from a signal handler. Figure 5.10lists these Posix functions, along with a few that were added by Unix98. Functions not listed may not be called from a signal andler. Note that none of the standard I/O functions ... are listed. Of call the IPC functions covered in this text, only sem_post, read, and write are listed (we are assuming the lattertwo would be used with pipes and FIFOs). Restricting the handler to use those in the approved list seems like an automatic improvement to me, even in the apparent absence of evidence of problems on those platforms that happen to get tested most. > > A pipe per backend might be considered pretty expensive. > > Pipe per postmaster, no? That doesn't seem like a huge cost. I haven't looked at how complex the signal handling in the backends is; maybe they don't need anything this fancy. (OTOH, maybe they should be using a pipe to communicate with postmaster, instead of using signals.) > I'd be > more concerned about the two extra kernel calls (write and read) per > signal received, actually. Are there so many signals flying around? The signal handler would check a flag before writing, so a storm of signals would result in only one call to write, and one call to read, per select loop. Nathan Myers ncm@zembu.com
Tom Lane <tgl@sss.pgh.pa.us> writes: > In short: this code isn't broken, and so I'm not convinced we should > "fix" it. Point well taken. I was basically arguing from general principles. ;) -Doug -- The rain man gave me two cures; he said jump right in, The first was Texas medicine--the second was just railroad gin, And like a fool I mixed them, and it strangled up my mind, Now people just get uglier, and I got no sense of time... --Dylan
On Thu, Jun 14, 2001 at 05:10:58PM -0400, Tom Lane wrote: > Doug McNaught <doug@wireboard.com> writes: > > Tom Lane <tgl@sss.pgh.pa.us> writes: > >> Hm. That's one way, but is it really any cleaner than our existing > >> technique? Since you still need to assume you can do a system call > >> in a signal handler, it doesn't seem like a real gain in > >> bulletproofness to me. > > > Doing write() in a signal handler is safe; doing fprintf() (and > > friends) is not. > > If we were calling the signal handlers from random places, then I'd > agree. But we're not: we use sigblock to ensure that signals are only > serviced at the place in the postmaster main loop where select() is > called. So there's no actual risk of reentrant use of non-reentrant > library functions. > > Please recall that in practice the postmaster is extremely reliable. > The single bug we have seen with the signal handlers in recent releases > was the problem that they were clobbering errno, which was easily fixed > by saving/restoring errno. This same bug would have arisen (though at > such low probability we'd likely never have solved it) in a signal > handler that only invoked write(). So I find it difficult to buy the > argument that there's any net gain in robustness to be had here. > > In short: this code isn't broken, and so I'm not convinced we should > "fix" it. Formally speaking, it *is* broken: we depend on semantics that are documented as unportable and undefined. In a sense, we have been so unlucky as not to have perceived, thus far, the undefined effects. This is no different from depending on finding a NUL at *(char*)0, or on being able to say "free(p); p = p->next;". Yes, it appears to work, at the moment, on some platforms, but that doesn't make it correct. It may not be terribly urgent to fix it right now, but that's far from "isn't broken". It at least merits a TODO entry. Nathan Myers ncm@zembu.com