hung postmaster? - Mailing list pgsql-general

From Ed L.
Subject hung postmaster?
Date
Msg-id 200502152251.59613.pgsql@bluepolka.net
Whole thread Raw
Responses Re: hung postmaster?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
I'm seeing some unpleasant database cluster seizures.  After
running fine for hours, days, even weeks, all of a sudden new
connections via psql, DBI, libpq, all completely hang with no
log message or error, while existing connections can continue to
execute queries, log messages, etc.  Postmaster is totally
unresponsive to SIGTERM, SIGINT, and SIGQUIT  (sigkill is the
only thing that shuts it down).  Memory, CPU, and available
connections are plentiful.  Top, ps, glance, ls, netstat all are
very responsive.  I/O load has been pretty high, averaging
700-1100 physical IOs/second.  The first time this occurred,
last Friday, it was a mix of our 7.3.4 and 7.4.6 clusters.
Yesterday, it was all of the 7.4.6 64-bit clusters, none of the
32-bit 7.3.4 clusters.  Today it was two of the 7.4.6 clusters
and no others.

I'm trying to get gdb installed so I can attach to postmaster and
get a backtrace.  Other troubleshooting ideas appreciated.
Details below...

CONFIG:
========
Hardware:  One HP 64-bit Itanium rx4640, 16gb RAM, 4 cpus, SAN
(Cisco switches, HP EVA-5000 disk array and FC HBA's).
OS:  HP-UX B.11.23
Pgsql:  9 clusters installed and concurrently running.  4/9 are
32-bit PostgreSQL 7.3.4 on ia64-hp-hpux11.22, compiled by cc
-Ae.  The other 5 clusters are 64-bit PostgreSQL 7.4.6 on
ia64-hp-hpux11.23, compiled by gcc 3.3.2.

(We have 2 other identical boxes running without incident with
similar cluster mixes of 7.3.4, 7.3.7, 7.4.6.)


pgsql-general by date:

Previous
From: Neil Conway
Date:
Subject: Re: Need to check palloc() return value?
Next
From: Antonios Christofides
Date:
Subject: Re: Trading off large objects (arrays, large strings, large tables) for timeseries