Re: kill -KILL: What happens? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: kill -KILL: What happens?
Date
Msg-id AANLkTimqrUtme9L2TDQr-gBViT5pG=kCjNJ+us=_w22Z@mail.gmail.com
Whole thread Raw
In response to Re: kill -KILL: What happens?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: kill -KILL: What happens?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: kill -KILL: What happens?  (Aidan Van Dyk <aidan@highrise.ca>)
List pgsql-hackers
On Thu, Jan 13, 2011 at 2:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Jan 13, 2011 at 2:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Frankly I'd prefer to get rid of PostmasterIsAlive, not extend its use.
>>> It sucks because you don't get a signal on parent death.  With the
>>> arrival of the latch code, having to check for PostmasterIsAlive
>>> frequently is the only reason for an idle background process to consume
>>> CPU at all.
>
>> What we really need is SIGPARENT.  I wonder if the Linux folks would
>> consider adding such a thing.  Might be useful to others as well.
>
> That's pretty much a dead-end idea unfortunately; it would never be
> portable enough to let us change our system structure to rely on it.
> Even more to the point, "go away when the postmaster does" isn't
> really the behavior we want anyway.  "Go away when the last backend
> does" is what we want.

I'm not convinced.  I was thinking that we could simply treat it like
SIGQUIT, if it's available.  I doubt there's a real use case for
continuing to run queries after the postmaster and all the background
processes are dead.  Expedited death seems like much better behavior.
Even checking PostmasterIsAlive() once per query would be reasonable,
except that it'd add a system call to check for a condition that
almost never holds, which I'm not eager to do.

> I wonder whether we could have some sort of latch-like counter that
> would count the number of active backends and deliver signals when the
> count went to zero.  However, if the goal is to defend against random
> applications of SIGKILL, there's probably no way to make this reliable
> in userspace.

I don't think you can get there 100%.  We could, however, make a rule
that when a background process fails a PostmasterIsAlive() check, it
sends SIGQUIT to everyone it can find in the ProcArray, which would at
least ensure a timely exit in most real-world cases.

> Another idea is to have a "postmaster minder" process that respawns the
> postmaster when it's killed.  The hard part of that is that the minder
> can't be connected to shared memory (else its OOM cross-section is just
> as big as the postmaster's), and that makes it difficult for it to tell
> when all the children have gone away.  I suppose it could be coded to
> just retry every few seconds until success.  This doesn't improve the
> behavior of background processes at all, though.

It hardly seems worth it.  Given a reliable interlock against multiple
postmasters, the real concern is making sure that a half-dead
postmaster gets itself all-dead quickly so that the DBA can start up a
new one before he gets fired.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: kill -KILL: What happens?
Next
From: "Ross J. Reedstrom"
Date:
Subject: Re: Allowing multiple concurrent base backups