Re: It happened again: Server hung up solid - Mailing list pgsql-hackers

From Tom Lane
Subject Re: It happened again: Server hung up solid
Date
Msg-id 25204.957746133@sss.pgh.pa.us
Whole thread Raw
In response to It happened again: Server hung up solid  (The Hermit Hacker <scrappy@hub.org>)
Responses Re: It happened again: Server hung up solid  (The Hermit Hacker <scrappy@hub.org>)
Re: It happened again: Server hung up solid  (The Hermit Hacker <scrappy@hub.org>)
List pgsql-hackers
The Hermit Hacker <scrappy@hub.org> writes:
> Okay, this is with code of ~May 4th ... a 'psql' connection to the
> database hangs solid.

Do you mean you can't make a connection at all?  Is there any indication
that the postmaster is lighting off a backend for you?  Since you show
a couple of zombie backends hanging around, it would seem like a good
bet that the postmaster itself is wedged and not responding to events,
but I'm not sure.

> errout is dated:

> pgsql% !ls
> ls -lt
> total 13324
> -rw-------   1 pgsql  pgsql  4842715 May  7 10:57 errout.5432

> and the last few lines contain:

> ERROR:  parser: parse error at or near "vpti"
> pq_recvbuf: unexpected EOF on client connection
> pq_flush: send() failed: Broken pipe
> pq_recvbuf: recv() failed: Connection reset by peer
> pq_recvbuf: unexpected EOF on client connection
> pq_recvbuf: unexpected EOF on client connection
> pq_flush: send() failed: Broken pipe
> pq_recvbuf: recv() failed: Connection reset by peer

> But, of course, no date/time ...

Given that the file mod time is considerably before the hang (right?)
the messages in it are probably unrelated.  It does seem odd that you
have so many clients disconnecting ungracefully; what client apps are
you running?

> Since this is a production server, I can't just leave it there hung like
> that, but if someone wants to give some instructions on what to do the
> next time this happens, please feel free to do so, and I'll add that to my
> list ... maybe run a gdb command on it, since truss doesn't appear to
> help?

Try killing the postmaster itself in such a way as to produce a coredump
(kill -ABORT ought to do) and get a backtrace from that.  It might also
be worth running the postmaster with connection tracing turned on (I
forget the incantation for that, but it should be in TFM).

> At this time, I consider this to be a show-stopper on the release ... this
> is what happened the last time when the result appeared to be the index
> corruption

If the postmaster is hanging then it's almost certainly unrelated to
index corruption...
        regards, tom lane


pgsql-hackers by date:

Previous
From: The Hermit Hacker
Date:
Subject: Re: CREATE DATABASE WITH OWNER '??';
Next
From: The Hermit Hacker
Date:
Subject: Re: It happened again: Server hung up solid