Thread: Postmaster crashes with "Serverloop: select failed" message

Postmaster crashes with "Serverloop: select failed" message

From
Paul Breen
Date:
Hello all,

We've recently completed a Postgres based project.  The application uses 2
Linux server's, both running Postgres, to allow database replication.
From our application we write to both databases within a transaction and
if either update fails we rollback.  We use the C API (libpq), the PHP API
and also some Tcl/Tk stuff (pgtksh).

The problem is that the Postmaster has crashed twice now and both times
the last message in the Postmaster's log was:

    /usr/local/pgsql/bin/postmaster: ServerLoop: select failed: No child
    processes

We also seem to get a large number of the following message in our
Postmaster's log:

    pq_recvbuf: unexpected EOF on client connection

Are these connected?  What do the messages mean? This is what we are
using:

System: Linux FGD 2.2.13 #1 Thu Dec 16 13:55:58 GMT 1999 i586 unknown
Database Server: PostgreSQL 6.5.3 on i586-pc-linux-gnu,
compiled by gcc egcs-2.91.66
Web Server: Apache/1.3.9 (Unix) (SuSE/Linux) PHP/3.0.12
PHP: 3.0.12
Tcl/Tk: 8.2

Any help with this problem would be greatly appreciated.  Originally the
application was well received but these crashes have dented user's belief
in it!  More information can be supplied if necessary.

Thanks in advance.

Paul M. Breen, Software Engineer - Computer Park Ltd.

Tel:   (01536) 417155
Email: pbreen@computerpark.co.uk




Re: Postmaster crashes with "Serverloop: select failed" message

From
Tom Lane
Date:
Paul Breen <paulb@computerpark.co.uk> writes:
> The problem is that the Postmaster has crashed twice now and both times
> the last message in the Postmaster's log was:
>     /usr/local/pgsql/bin/postmaster: ServerLoop: select failed: No child
>     processes

This sounds like the bug we recently recognized that the SIGCHLD signal
processor has to save and restore errno.  There is a fix in current
sources.  I do not have a patch for 7.0.* handy, but you could probably
adapt the change that was applied:

http://www.postgresql.org/cgi/cvsweb.cgi/pgsql/src/backend/postmaster/postmaster.c.diff?r1=1.198&r2=1.199&f=c

The additions to reaper() are the only critical part, I think.

> We also seem to get a large number of the following message in our
> Postmaster's log:
>     pq_recvbuf: unexpected EOF on client connection
> Are these connected?  What do the messages mean?

No, those just mean that some client is disconnecting without bothering
to send the "I'm done" message.  It's pretty harmless from the DB's
point of view.  Do you have a client that crashes a lot?

            regards, tom lane

Re: Postmaster crashes with "Serverloop: select failed" message

From
Paul Breen
Date:
Thanks Tom,

We applied a modified version of the patch by hand.  Looking through the
code it does seem that the patch should fix the problem we were having.
However because we've never been able to forcibly reproduce the problem
ourselves for testing it's just a case of keeping an eye on things to
ensure the problem is fixed!

Once again, thanks very much!


Paul M. Breen, Software Engineer - Computer Park Ltd.

Tel:   (01536) 417155
Email: pbreen@computerpark.co.uk

On Fri, 5 Jan 2001, Tom Lane wrote:

> Paul Breen <paulb@computerpark.co.uk> writes:
> > The problem is that the Postmaster has crashed twice now and both times
> > the last message in the Postmaster's log was:
> >     /usr/local/pgsql/bin/postmaster: ServerLoop: select failed: No child
> >     processes
>
> This sounds like the bug we recently recognized that the SIGCHLD signal
> processor has to save and restore errno.  There is a fix in current
> sources.  I do not have a patch for 7.0.* handy, but you could probably
> adapt the change that was applied:
>
> http://www.postgresql.org/cgi/cvsweb.cgi/pgsql/src/backend/postmaster/postmaster.c.diff?r1=1.198&r2=1.199&f=c
>
> The additions to reaper() are the only critical part, I think.
>
> > We also seem to get a large number of the following message in our
> > Postmaster's log:
> >     pq_recvbuf: unexpected EOF on client connection
> > Are these connected?  What do the messages mean?
>
> No, those just mean that some client is disconnecting without bothering
> to send the "I'm done" message.  It's pretty harmless from the DB's
> point of view.  Do you have a client that crashes a lot?
>
>             regards, tom lane
>