Thread: Syslogger tries to write to /dev/null on Windows
A customer is facing a problem on PostgreSQL 8.3.10 on Windows where the syslogger process dies mysteriously every few hours under load. I haven't yet figured out why, but when postmaster tries to respawn syslogger, it doesn't start up but dies immediately. The reason the relaunch fails is that in SysLoggerMain we have this: > /* > * If we restarted, our stderr is already redirected into our own input > * pipe. This is of course pretty useless, not to mention that it > * interferes with detecting pipe EOF. Point stderr to /dev/null. This > * assumes that all interesting messages generated in the syslogger will > * come through elog.c and will be sent to write_syslogger_file. > */ > if (redirection_done) > { > int fd = open(NULL_DEV, O_WRONLY, 0); > > /* > * The closes might look redundant, but they are not: we want to be > * darn sure the pipe gets closed even if the open failed. We can > * survive running with stderr pointing nowhere, but we can't afford > * to have extra pipe input descriptors hanging around. > */ > close(fileno(stdout)); > close(fileno(stderr)); > dup2(fd, fileno(stdout)); > dup2(fd, fileno(stderr)); > close(fd); > } NULL_DEV is defined in c.h as "/dev/null", which doesn't work on windows. We have a port-specific #define DEVNULL which does work, we should be using that. Peter Eisentraut inadvertently fixed this for 8.4: http://archives.postgresql.org/pgsql-committers/2008-12/msg00095.php by removing NULL_DEV and using always DEVNULL, but back-branches need that too. I'll leave NULL_DEV as it is just in case it's used by 3rd party modules, but change the two uses of it to use DEVNULL. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
2010/4/1 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>: > NULL_DEV is defined in c.h as "/dev/null", which doesn't work on > windows. We have a port-specific #define DEVNULL which does work, we > should be using that. Wow. Looking at the code around NULL_DEV, that actually looks like a merge mistake from the old windows branch or something - it has a comment stating that it should be different on NT, but it isn't... > Peter Eisentraut inadvertently fixed this for 8.4: > > http://archives.postgresql.org/pgsql-committers/2008-12/msg00095.php > > by removing NULL_DEV and using always DEVNULL, but back-branches need > that too. I'll leave NULL_DEV as it is just in case it's used by 3rd > party modules, but change the two uses of it to use DEVNULL. Sounds good to me. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > A customer is facing a problem on PostgreSQL 8.3.10 on Windows where the > syslogger process dies mysteriously every few hours under load. I > haven't yet figured out why, but when postmaster tries to respawn > syslogger, it doesn't start up but dies immediately. > The reason the relaunch fails is that in SysLoggerMain we have this: >> /* >> * If we restarted, our stderr is already redirected into our own input >> * pipe. This is of course pretty useless, not to mention that it >> * interferes with detecting pipe EOF. Point stderr to /dev/null. This >> * assumes that all interesting messages generated in the syslogger will >> * come through elog.c and will be sent to write_syslogger_file. >> */ >> if (redirection_done) >> { >> int fd = open(NULL_DEV, O_WRONLY, 0); >> >> /* >> * The closes might look redundant, but they are not: we want to be >> * darn sure the pipe gets closed even if the open failed. We can >> * survive running with stderr pointing nowhere, but we can't afford >> * to have extra pipe input descriptors hanging around. >> */ >> close(fileno(stdout)); >> close(fileno(stderr)); >> dup2(fd, fileno(stdout)); >> dup2(fd, fileno(stderr)); >> close(fd); >> } > NULL_DEV is defined in c.h as "/dev/null", which doesn't work on > windows. We have a port-specific #define DEVNULL which does work, we > should be using that. Hmm. I agree with your proposed change, but it seems to me that there is still another mystery here: why does the mistaken open() argument lead to a crash? Per the second comment, this code is supposed to keep working even if the open() fails. If it fails because of that, we have robustness problems everywhere not only on Windows --- consider ENFILE or some such. regards, tom lane
Tom Lane wrote: > Hmm. I agree with your proposed change, but it seems to me that there > is still another mystery here: why does the mistaken open() argument > lead to a crash? Per the second comment, this code is supposed to keep > working even if the open() fails. If it fails because of that, we have > robustness problems everywhere not only on Windows --- consider ENFILE > or some such. Hmm, good question. The open() fails and returns a return code as you would expect. But the dup2() call crashes when passed an invalid file descriptor, I just tested that with a small test program on Windows. Googling around, this seems to be because of a feature called Parameter Validation: http://msdn.microsoft.com/en-us/library/ksazx244.aspx Following the example there to set a custom Invalid Parameter Handler Routine makes dup2() not crash on an invalid file handle. We could do that in PostgreSQL, setting a handler that reports a warning in the log and continues. Or we could just be more careful when passing parameters to system functions; this is the first time we hear about this so it doesn't seem to be a very widespread issue. In this case we should do this: *** a/src/backend/postmaster/syslogger.c --- b/src/backend/postmaster/syslogger.c *************** *** 194,202 **** */ close(fileno(stdout)); close(fileno(stderr)); ! dup2(fd, fileno(stdout)); ! dup2(fd, fileno(stderr)); ! close(fd); } /* --- 194,205 ---- */ close(fileno(stdout)); close(fileno(stderr)); ! if (fd != -1) ! { ! dup2(fd, fileno(stdout)); ! dup2(fd, fileno(stderr)); ! close(fd); ! } } /* -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > The open() fails and returns a return code as you would expect. But the > dup2() call crashes when passed an invalid file descriptor, I just > tested that with a small test program on Windows. Ah, thanks Windows :-( > ! if (fd != -1) > ! { > ! dup2(fd, fileno(stdout)); > ! dup2(fd, fileno(stderr)); > ! close(fd); > ! } +1 for fixing it like this. It's cleaner anyway. Is that actually the cause of the original bug report, or is there another issue yet to solve? regards, tom lane
Tom Lane wrote: > Is that actually the cause of the original bug report, or is there > another issue yet to solve? I still don't know what caused syslogger to die in the first place, this bug only affects its respawning. It might not be a PostgreSQL issue at all, but we'll see. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com