On Fri, Jun 21, 2013 at 5:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Andres Freund escribió:
>>> What we could do to improve the robustness a bit - at least on linux -
>>> is to prctl(PR_SET_PDEATHSIG, SIGKILL) which would cause children to be
>>> killed if the postmaster goes away...
>
>> This is an interesting idea (even if it has no relationship to the patch
>> at hand).
>
> The traditional theory has been that that would be less robust, not
> more so. Child backends are (mostly) able to carry out queries whether
> or not the postmaster is around.
I think that's the Tom Lane theory. The Robert Haas theory is that if
the postmaster has died, there's no reason to suppose that it hasn't
corrupted shared memory on the way down, or that the system isn't
otherwise heavily fuxxored in some way.
> True, you can't make new connections,
> but how does killing the existing children make that better?
It allows you to start a new postmaster in a timely fashion, instead
of waiting for an idle connection that may not ever terminate without
operator intervention.
Even if it were possible to start a new postmaster that attached to
the existing shared memory segment and began spawning new children, I
think I'd be heavily in favor of killing the old ones off first and
doing a full system reset just for safety. But it isn't, so what you
get is a crippled system that never recovers without operator
intervention. And note that I'm not talking about "pg_ctl restart";
that fails because the children have the shmem segment still attached
and the postmaster, which is the only thing listed in the PID file, is
already dead. I'm talking about "killall -9 postgres", at least.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company