On Wednesday February 16 2005 3:38, Martijn van Oosterhout wrote:
> On Wed, Feb 16, 2005 at 11:41:35AM -0700, Ed L. wrote:
> > Question: Am I doing all I can to avoid corruption with the
> > following procedure to shutdown a 7.4.6 cluster with a hung
> > postmaster? Suggestions?
>
> What is the state of the processes in ps? D, S, R, ?? That
> should at least give a hint as to what it *is* doing...
The postmasters were sleeping. I was able to trace a local hung
psql client through lsof-->netstat to see the connection was in
tcp "ESTABLISHED" state.
Anyway, before we installed gdb (wdb) from hp dvd to get a
backtrace, I'd already guessed it could be a problem with the
build since the 7.4.6 build that was hanging was built with gcc
3.2.2, while the 7.4.6 builds on two other identical boxes were
built with gcc 3.4.2. So we shutdown all the 7.4.6 clusters,
rebuilt using gcc 3.4.2, identical build steps AFAICS, and this
nasty problem has apparently gone away (knock on wood).
I have saved a copy of the problematic gcc-3.2.2-built 7.4.6
installation executables, libraries, etc. in case anyone is
interested. Time permitting, I may fire them up with a new
cluster and see if I can recreate the problem to get a
backtrace.
Thanks.
Ed