Re: It happened again: Server hung up solid - Mailing list pgsql-hackers

From The Hermit Hacker
Subject Re: It happened again: Server hung up solid
Date
Msg-id Pine.BSF.4.21.0005072157060.87721-100000@thelab.hub.org
Whole thread Raw
In response to Re: It happened again: Server hung up solid  (The Hermit Hacker <scrappy@hub.org>)
Responses Re: It happened again: Server hung up solid  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers

kill -ABRT does nothing:

pgsql% kill -ABRT 33683
pgsql% !ps
ps ux
USER    PID %CPU %MEM   VSZ  RSS  TT  STAT STARTED      TIME COMMAND
pgsql 34611  0.0  0.0     0    0  ??  Z     8:43PM   0:00.00  (postgres)
pgsql 93757  0.0  0.2  1456 1104  p0  S    Wed03PM   0:01.17 -su (tcsh)
pgsql 33683  0.0  0.6 38356 3024  ??  Is    7:38PM   0:03.54 /pgsql/bin/postmas
pgsql 34677  0.0  0.2  1408 1048  p2  S+    8:50PM   0:00.08 -su (tcsh)
pgsql 34696  0.0  0.0   396  232  p0  R+    8:56PM   0:00.00 ps ux
pgsql% !ps
ps ux
USER    PID %CPU %MEM   VSZ  RSS  TT  STAT STARTED      TIME COMMAND
pgsql 34611  0.0  0.0     0    0  ??  Z     8:43PM   0:00.00  (postgres)
pgsql 93757  0.0  0.2  1456 1104  p0  S    Wed03PM   0:01.17 -su (tcsh)
pgsql 33683  0.0  0.6 38356 3024  ??  Is    7:38PM   0:03.54 /pgsql/bin/postmas
pgsql 34677  0.0  0.2  1408 1048  p2  S+    8:50PM   0:00.08 -su (tcsh)
pgsql 34697  0.0  0.0   396  232  p0  R+    8:56PM   0:00.00 ps ux


On Sun, 7 May 2000, The Hermit Hacker wrote:

> 
> 
> Okay, just happened again ... no postgres backend is being started:
> 
> USER    PID %CPU %MEM   VSZ  RSS  TT  STAT STARTED      TIME COMMAND
> pgsql 34611  0.0  0.0     0    0  ??  Z     8:43PM   0:00.00  (postgres)
> pgsql 93757  0.0  0.2  1456 1104  p0  S    Wed03PM   0:01.16 -su (tcsh)
> pgsql 33683  0.0  0.6 38356 3024  ??  Is    7:38PM   0:03.54 /pgsql/bin/postmaster -B 4096 -N 128 -S -o -F -o
/pgsql/errout.5432
> pgsql 34677  0.0  0.2  1408 1048  p2  S     8:50PM   0:00.07 -su (tcsh)
> pgsql 34685  0.0  0.2  1652 1032  p0  S+    8:51PM   0:00.01 psql udmsearch
> pgsql 34687  0.0  0.0   400  232  p2  R+    8:51PM   0:00.00 ps ux
> 
> Going to look at the connection tracing option now and see what I can come
> up with ...
> 
> 
> On Sun, 7 May 2000, Tom Lane wrote:
> 
> > The Hermit Hacker <scrappy@hub.org> writes:
> > > Okay, this is with code of ~May 4th ... a 'psql' connection to the
> > > database hangs solid.
> > 
> > Do you mean you can't make a connection at all?  Is there any indication
> > that the postmaster is lighting off a backend for you?  Since you show
> > a couple of zombie backends hanging around, it would seem like a good
> > bet that the postmaster itself is wedged and not responding to events,
> > but I'm not sure.
> > 
> > > errout is dated:
> > 
> > > pgsql% !ls
> > > ls -lt
> > > total 13324
> > > -rw-------   1 pgsql  pgsql  4842715 May  7 10:57 errout.5432
> > 
> > > and the last few lines contain:
> > 
> > > ERROR:  parser: parse error at or near "vpti"
> > > pq_recvbuf: unexpected EOF on client connection
> > > pq_flush: send() failed: Broken pipe
> > > pq_recvbuf: recv() failed: Connection reset by peer
> > > pq_recvbuf: unexpected EOF on client connection
> > > pq_recvbuf: unexpected EOF on client connection
> > > pq_flush: send() failed: Broken pipe
> > > pq_recvbuf: recv() failed: Connection reset by peer
> > 
> > > But, of course, no date/time ...
> > 
> > Given that the file mod time is considerably before the hang (right?)
> > the messages in it are probably unrelated.  It does seem odd that you
> > have so many clients disconnecting ungracefully; what client apps are
> > you running?
> > 
> > > Since this is a production server, I can't just leave it there hung like
> > > that, but if someone wants to give some instructions on what to do the
> > > next time this happens, please feel free to do so, and I'll add that to my
> > > list ... maybe run a gdb command on it, since truss doesn't appear to
> > > help?
> > 
> > Try killing the postmaster itself in such a way as to produce a coredump
> > (kill -ABORT ought to do) and get a backtrace from that.  It might also
> > be worth running the postmaster with connection tracing turned on (I
> > forget the incantation for that, but it should be in TFM).
> > 
> > > At this time, I consider this to be a show-stopper on the release ... this
> > > is what happened the last time when the result appeared to be the index
> > > corruption
> > 
> > If the postmaster is hanging then it's almost certainly unrelated to
> > index corruption...
> > 
> >             regards, tom lane
> > 
> 
> Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
> Systems Administrator @ hub.org 
> primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 
> 

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



pgsql-hackers by date:

Previous
From: Vince Vielhaber
Date:
Subject: Re: It happened again: Server hung up solid
Next
From: The Hermit Hacker
Date:
Subject: Re: It happened again: Server hung up solid