Re: Attempt to stop dead instance can stop a random process? - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: Attempt to stop dead instance can stop a random process?
Date
Msg-id 46D828AB.EE98.0025.0@wicourts.gov
Whole thread Raw
In response to Re: Attempt to stop dead instance can stop a random process?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Attempt to stop dead instance can stop a random process?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Attempt to stop dead instance can stop a random process?  (tomas@tuxteam.de)
List pgsql-hackers
>>> On Fri, Aug 31, 2007 at  2:18 PM, in message <381.1188587883@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> It appears that when pg_ctl gets a stop request for a given directory, it l=
>> ooks for a pid file in that directory and signals that pid to stop.  It doe=
>> sn't appear to check that the pid is for a PostgreSQL postmaster running ou=
>> t of the given directory.  I think it should, although on a quick scan of t=
>> he code, I didn't see a convenient way to do that.
>
> [ shrug... ]  AFAICS there is no way to know that.
I sure couldn't see a way, but I was hoping that was just a matter of my own
ignorance.
>> I have some evidence that when we attempted to stop a PostgreSQL instance w=
>> hich (it turned out) had died without cleaning up the pid file, it actually=
>>  stopped another instance which was using a different data directory but ha=
>> d wrapped around to the same pid.
>
> The real question there is how come the postmaster died without removing
> the pidfile.  It's not that easy to crash the postmaster ...
Well, that's not due to a bug in PostgreSQL.  We're using a buggy LDAP
implementation (not my call) which can crash things.  The machine totally
locked up after logging distress messages from that daemon, and they cycled
power to get out of it.
The PostgreSQL issue here was a secondary problem in trying to get the
server back to normal.  So really, what I was suggesting was something to
improve the robustness of PostgreSQL in the face of severe challenges posed
by other issues.  I realize it's a very low volume issue; if it's not easy
to fix, probably not worth it.
Now to bug the people on the list of authorized contacts for Novell to open
a support case on the LDAP problems, and see how many of the 40 core dumps
I have from their daemon they want to see.
-Kevin



pgsql-hackers by date:

Previous
From: "Florian G. Pflug"
Date:
Subject: [PATCH] Lazy xid assingment V2
Next
From: Tom Lane
Date:
Subject: Re: Attempt to stop dead instance can stop a random process?