Thread: Re: [SQL] PostgreSQL crashes on me :(
Mathijs Brands <mathijs@ilse.nl> writes: > We recently installed a small server for an external party to develop > websites on. This machine, a K6-233 with 256 MB, is running FreeBSD 3.3 > and PostgreSQL 7.0.2 (maybe I'll upgrade to 7.0.3 tonight). The database > it's running is about 2 MB in size and gets to process an estimated 10000 > to 25000 queries per day. Nothing special, I'd say. > However, pgsql keeps crashing. It can take days, but pgsql will crash. > It spits out the following error: > ServerLoop: select failed: No child processes Hm. It seems fairly unlikely that select() would return error ECHILD, which is what this message *appears* to imply. The code is if (select(nSockets, &rmask, &wmask, (fd_set *) NULL, (struct timeval *) NULL) < 0) { if (errno == EINTR) continue; fprintf(stderr, "%s: ServerLoop: select failed: %s\n", progname, strerror(errno)); return STATUS_ERROR; } which seems pretty straightforward. BUT: I think there's a race condition here, at least on systems where errno is not saved and restored around a signal handler. Consider the following scenario: Postmaster is waiting at the select() --- its normal state. Postmaster receives a SIGCHLD signal due to backend exit, soit goes off and does the reaper() thing. On return fromreaper()the system arranges to return EINTR error fromthe select(). Before control can reach the "if (errno..." test, anotherSIGCHLD comes in. reaper() is invoked again and does itsthing. The normal exit condition from reaper() will be errno == ECHILD, because that's what the waitpid() or wait3() call will return after all children are dealt with. If the signal-handling mechanism allows that to be returned to the mainline code, we have a failure. Can any FreeBSD hackers comment on the plausibility of this theory? A quick-and-dirty workaround would be to save and restore errno in reaper() and the other postmaster signal handlers. It might be a better idea in the long run to avoid doing system calls in the signal handlers --- but that would take a more substantial rewrite. I seem to recall previous pghackers discussions in which saving/restoring errno looked like a good idea. Not sure why it hasn't been done already. regards, tom lane
On Sun, Dec 17, 2000 at 10:47:55PM -0500, Tom Lane wrote: > > A quick-and-dirty workaround would be to save and restore errno in > reaper() and the other postmaster signal handlers. It might be > a better idea in the long run to avoid doing system calls in the > signal handlers --- but that would take a more substantial rewrite. > > I seem to recall previous pghackers discussions in which > saving/restoring errno looked like a good idea. Not sure why > it hasn't been done already. Syscalls in signal handlers open doors for lots of mischief, not least because it's impractical to test all the combinations of half-run mainline syscalls with all the other syscalls that may occur in signal handlers. (select() and wait(), by themselves, get exercised enough, but bugs in other pairs may be too hard to reproduce ever to be identified.) Maybe somebody noted that patching errno handling would only close one hole, and make it less likely the real fix would be done. That the signal handler got an error on its syscall might be cause for concern as well. Sounds like a TODO list item: eliminate syscalls from signal handlers. Nathan Myers ncm@zembu.com
ncm@zembu.com (Nathan Myers) writes: > Sounds like a TODO list item: eliminate syscalls from signal handlers. After looking at it some more, I think that's a lot easier said than done. We could try writing the postmaster's SIGCHLD routine in the same style currently used for SIGHUP --- ie, signal handler just sets a flag that's examined by the main loop in ServerLoop --- but I don't see any way to guarantee timely response if we do that. If the SIGCHLD happens just before we reach the select() then the select() will block, and we won't respond to the dying child until the next connection request arrives or some other signal happens. That's OK under normal scenarios, but highly not OK when a backend has crashed. Any thoughts on a cleaner solution? BTW, we do block signals except at the select(), so the fear of random syscalls being interrupted by random other syscalls is overstated. regards, tom lane
Date: Sun, 17 Dec 2000 22:47:55 -0500 From: Tom Lane <tgl@sss.pgh.pa.us> BUT: I think there's a race condition here, at least on systems where errno is not saved and restored around a signalhandler. Consider the following scenario: Postmaster is waiting at the select() --- its normal state. Postmaster receives a SIGCHLD signal due to backend exit, so it goes off and does the reaper() thing. On return from reaper() the system arranges to return EINTR error from the select(). Before control can reach the "if (errno..." test, another SIGCHLD comes in. reaper() is invoked again and does its thing. The normal exit condition from reaper() will be errno == ECHILD, because that's what the waitpid() or wait3() call willreturn after all children are dealt with. If the signal-handling mechanism allows that to be returned to the mainlinecode, we have a failure. Can any FreeBSD hackers comment on the plausibility of this theory? I'm not a FreeBSD hacker, but I do know how the BSD kernel works unless FreeBSD has changed things. The important facts are: 1) The kernel only delivers signals when a process moves from kernel mode to user mode, after a system call or an interrupt(including a timer interrupt). 2) The errno variable is set in user space after the process has returned to user mode. Therefore, the scenario you describe is possible, but only if there happens to be both a timer interrupt and a SIGCHLD signal within a couple of instructions after the select returns. (I suppose that a page fault instead of a timer interrupt could have the same effect as well, although a page fault here seems quite unlikely unless the system is extremely overloaded.) A quick-and-dirty workaround would be to save and restore errno in reaper() and the other postmaster signal handlers. It might be a better idea in the long run to avoid doing system calls in the signal handlers --- but that wouldtake a more substantial rewrite. Ideally, signal handlers should not make system calls. However, if this is impossible, then signal handlers must save and restore errno. Ian
Ian Lance Taylor <ian@airs.com> writes: > Therefore, the scenario you describe is possible, but only if there > happens to be both a timer interrupt and a SIGCHLD signal within a > couple of instructions after the select returns. Right. A small failure window would explain the infrequency of the symptom. > Ideally, signal handlers should not make system calls. However, if > this is impossible, then signal handlers must save and restore errno. I have just committed fixes to do that in current sources (7.1 branch). I doubt we're going to make a 7.0.4 release, but if we do then this ought to be back-patched. regards, tom lane
Date: Mon, 18 Dec 2000 10:41:47 -0500 From: Tom Lane <tgl@sss.pgh.pa.us> ncm@zembu.com (Nathan Myers) writes: > Sounds like a TODO list item: eliminate syscalls from signal handlers. After looking at it some more, I think that's a lot easier said than done. We could try writing the postmaster's SIGCHLDroutine in the same style currently used for SIGHUP --- ie, signal handler just sets a flag that's examined bythe main loop in ServerLoop --- but I don't see any way to guarantee timely response if we do that. If the SIGCHLD happensjust before we reach the select() then the select() will block, and we won't respond to the dying child until thenext connection request arrives or some other signal happens. That's OK under normal scenarios, but highly not OK whena backend has crashed. Any thoughts on a cleaner solution? One way to avoid this race condition is to set a timeout on the select. What is the maximum acceptable time for a timely response? Would executing a few instructions at that time interval impose an unacceptable system load? (Naturally no timeout should be used if there are no child processes.) Ian
Ian Lance Taylor <ian@airs.com> writes: > Any thoughts on a cleaner solution? > One way to avoid this race condition is to set a timeout on the > select. What is the maximum acceptable time for a timely response? I thought about that, but it doesn't seem like a cleaner solution. Basically you'd have to figure a tradeoff between wasted cycles in the postmaster and time delay to respond to a crashed backend. And there's no good tradeoff there. If you have a backend crash, you want to shut down the other backends ASAP, before they have a chance to propagate any shared-memory corruption that the failed backend might've created. The entire exercise is probably pointless if the postmaster twiddles its thumbs for awhile before killing the other backends. regards, tom lane
Date: Mon, 18 Dec 2000 13:18:26 -0500 From: Tom Lane <tgl@sss.pgh.pa.us> Ian Lance Taylor <ian@airs.com> writes: > Any thoughts on a cleaner solution? > One way to avoid this race condition is to set a timeout on the > select. What is the maximum acceptable time for atimely response? I thought about that, but it doesn't seem like a cleaner solution. Basically you'd have to figure a tradeoff between wastedcycles in the postmaster and time delay to respond to a crashed backend. And there's no good tradeoff there. Ifyou have a backend crash, you want to shut down the other backends ASAP, before they have a chance to propagate any shared-memorycorruption that the failed backend might've created. The entire exercise is probably pointless if the postmastertwiddles its thumbs for awhile before killing the other backends. The timeout is only to catch the case where the child dies between checking the flag and calling select. What you really want, of course, is a version of select which lets you atomically control the signal blocking mask. That function is actually specified by the most recent version of POSIX; it's called pselect; it's just like select, but the last argument is a const sigset_t * which is atomically passed to sigprocmask(SIG_SETMASK) around the select (the original signal mask is restored before pselect returns). But I don't know which kernels implement it. There is an implementation on GNU/Linux, but if the kernel doesn't support it then it is not truly atomic. Ian
Ian Lance Taylor <ian@airs.com> writes: > What you really want, of course, is a version of select which lets you > atomically control the signal blocking mask. Yeah, that would be a much cleaner solution. Someday pselect() may even be portable enough ... regards, tom lane
Mathijs Brands <mathijs@ilse.nl> writes: > I've worked around the situation by running a small script that continually > monitors postgres and takes appropriate action if postgres shuts down. I'm > assuming this problem won't lead to any data corruption. Hm. The problem here is that when the postmaster crashes, it probably doesn't take down the old backends with it. So if you auto-restart the postmaster, you may have two sets of backends that don't know about each other. That would be A Bad Thing. If your script kills off all the backends it can find before starting the new postmaster, that should work OK. A better solution would be to back-patch my errno fix into 7.0.3. You can see the diff as it stands for current sources at http://www.postgresql.org/cgi/cvsweb.cgi/pgsql/src/backend/postmaster/postmaster.c.diff?r1=1.198&r2=1.199&f=c The additions to reaper() are probably the only part you really need. regards, tom lane
Hello all, I would like to thank Tom, Ian and the other pgsql wizards for their prompt response. This must surely be open source at it's best :) I've worked around the situation by running a small script that continually monitors postgres and takes appropriate action if postgres shuts down. I'm assuming this problem won't lead to any data corruption. Mathijs -- "Borrowers of books -- those mutilators of collections, spoilers of thesymmetry of shelves, and creators of odd volumes." Charles Lamb (1775-1834)