Re: It happened again: Server hung up solid - Mailing list pgsql-hackers

From The Hermit Hacker
Subject Re: It happened again: Server hung up solid
Date
Msg-id Pine.BSF.4.21.0005072152090.87721-100000@thelab.hub.org
Whole thread Raw
In response to Re: It happened again: Server hung up solid  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: It happened again: Server hung up solid  (Vince Vielhaber <vev@michvhf.com>)
Re: It happened again: Server hung up solid  (The Hermit Hacker <scrappy@hub.org>)
List pgsql-hackers

Okay, just happened again ... no postgres backend is being started:

USER    PID %CPU %MEM   VSZ  RSS  TT  STAT STARTED      TIME COMMAND
pgsql 34611  0.0  0.0     0    0  ??  Z     8:43PM   0:00.00  (postgres)
pgsql 93757  0.0  0.2  1456 1104  p0  S    Wed03PM   0:01.16 -su (tcsh)
pgsql 33683  0.0  0.6 38356 3024  ??  Is    7:38PM   0:03.54 /pgsql/bin/postmaster -B 4096 -N 128 -S -o -F -o
/pgsql/errout.5432
pgsql 34677  0.0  0.2  1408 1048  p2  S     8:50PM   0:00.07 -su (tcsh)
pgsql 34685  0.0  0.2  1652 1032  p0  S+    8:51PM   0:00.01 psql udmsearch
pgsql 34687  0.0  0.0   400  232  p2  R+    8:51PM   0:00.00 ps ux

Going to look at the connection tracing option now and see what I can come
up with ...


On Sun, 7 May 2000, Tom Lane wrote:

> The Hermit Hacker <scrappy@hub.org> writes:
> > Okay, this is with code of ~May 4th ... a 'psql' connection to the
> > database hangs solid.
> 
> Do you mean you can't make a connection at all?  Is there any indication
> that the postmaster is lighting off a backend for you?  Since you show
> a couple of zombie backends hanging around, it would seem like a good
> bet that the postmaster itself is wedged and not responding to events,
> but I'm not sure.
> 
> > errout is dated:
> 
> > pgsql% !ls
> > ls -lt
> > total 13324
> > -rw-------   1 pgsql  pgsql  4842715 May  7 10:57 errout.5432
> 
> > and the last few lines contain:
> 
> > ERROR:  parser: parse error at or near "vpti"
> > pq_recvbuf: unexpected EOF on client connection
> > pq_flush: send() failed: Broken pipe
> > pq_recvbuf: recv() failed: Connection reset by peer
> > pq_recvbuf: unexpected EOF on client connection
> > pq_recvbuf: unexpected EOF on client connection
> > pq_flush: send() failed: Broken pipe
> > pq_recvbuf: recv() failed: Connection reset by peer
> 
> > But, of course, no date/time ...
> 
> Given that the file mod time is considerably before the hang (right?)
> the messages in it are probably unrelated.  It does seem odd that you
> have so many clients disconnecting ungracefully; what client apps are
> you running?
> 
> > Since this is a production server, I can't just leave it there hung like
> > that, but if someone wants to give some instructions on what to do the
> > next time this happens, please feel free to do so, and I'll add that to my
> > list ... maybe run a gdb command on it, since truss doesn't appear to
> > help?
> 
> Try killing the postmaster itself in such a way as to produce a coredump
> (kill -ABORT ought to do) and get a backtrace from that.  It might also
> be worth running the postmaster with connection tracing turned on (I
> forget the incantation for that, but it should be in TFM).
> 
> > At this time, I consider this to be a show-stopper on the release ... this
> > is what happened the last time when the result appeared to be the index
> > corruption
> 
> If the postmaster is hanging then it's almost certainly unrelated to
> index corruption...
> 
>             regards, tom lane
> 

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



pgsql-hackers by date:

Previous
From: The Hermit Hacker
Date:
Subject: Re: It happened again: Server hung up solid
Next
From: The Hermit Hacker
Date:
Subject: Documentation on postgres/postmaster ...