Thread: Postmaster crashes with "Serverloop: select failed" message
Hello all, We've recently completed a Postgres based project. The application uses 2 Linux server's, both running Postgres, to allow database replication. From our application we write to both databases within a transaction and if either update fails we rollback. We use the C API (libpq), the PHP API and also some Tcl/Tk stuff (pgtksh). The problem is that the Postmaster has crashed twice now and both times the last message in the Postmaster's log was: /usr/local/pgsql/bin/postmaster: ServerLoop: select failed: No child processes We also seem to get a large number of the following message in our Postmaster's log: pq_recvbuf: unexpected EOF on client connection Are these connected? What do the messages mean? This is what we are using: System: Linux FGD 2.2.13 #1 Thu Dec 16 13:55:58 GMT 1999 i586 unknown Database Server: PostgreSQL 6.5.3 on i586-pc-linux-gnu, compiled by gcc egcs-2.91.66 Web Server: Apache/1.3.9 (Unix) (SuSE/Linux) PHP/3.0.12 PHP: 3.0.12 Tcl/Tk: 8.2 Any help with this problem would be greatly appreciated. Originally the application was well received but these crashes have dented user's belief in it! More information can be supplied if necessary. Thanks in advance. Paul M. Breen, Software Engineer - Computer Park Ltd. Tel: (01536) 417155 Email: pbreen@computerpark.co.uk
Paul Breen <paulb@computerpark.co.uk> writes: > The problem is that the Postmaster has crashed twice now and both times > the last message in the Postmaster's log was: > /usr/local/pgsql/bin/postmaster: ServerLoop: select failed: No child > processes This sounds like the bug we recently recognized that the SIGCHLD signal processor has to save and restore errno. There is a fix in current sources. I do not have a patch for 7.0.* handy, but you could probably adapt the change that was applied: http://www.postgresql.org/cgi/cvsweb.cgi/pgsql/src/backend/postmaster/postmaster.c.diff?r1=1.198&r2=1.199&f=c The additions to reaper() are the only critical part, I think. > We also seem to get a large number of the following message in our > Postmaster's log: > pq_recvbuf: unexpected EOF on client connection > Are these connected? What do the messages mean? No, those just mean that some client is disconnecting without bothering to send the "I'm done" message. It's pretty harmless from the DB's point of view. Do you have a client that crashes a lot? regards, tom lane
Thanks Tom, We applied a modified version of the patch by hand. Looking through the code it does seem that the patch should fix the problem we were having. However because we've never been able to forcibly reproduce the problem ourselves for testing it's just a case of keeping an eye on things to ensure the problem is fixed! Once again, thanks very much! Paul M. Breen, Software Engineer - Computer Park Ltd. Tel: (01536) 417155 Email: pbreen@computerpark.co.uk On Fri, 5 Jan 2001, Tom Lane wrote: > Paul Breen <paulb@computerpark.co.uk> writes: > > The problem is that the Postmaster has crashed twice now and both times > > the last message in the Postmaster's log was: > > /usr/local/pgsql/bin/postmaster: ServerLoop: select failed: No child > > processes > > This sounds like the bug we recently recognized that the SIGCHLD signal > processor has to save and restore errno. There is a fix in current > sources. I do not have a patch for 7.0.* handy, but you could probably > adapt the change that was applied: > > http://www.postgresql.org/cgi/cvsweb.cgi/pgsql/src/backend/postmaster/postmaster.c.diff?r1=1.198&r2=1.199&f=c > > The additions to reaper() are the only critical part, I think. > > > We also seem to get a large number of the following message in our > > Postmaster's log: > > pq_recvbuf: unexpected EOF on client connection > > Are these connected? What do the messages mean? > > No, those just mean that some client is disconnecting without bothering > to send the "I'm done" message. It's pretty harmless from the DB's > point of view. Do you have a client that crashes a lot? > > regards, tom lane >