On Wed, 2009-09-23 at 10:04 -0400, Tom Lane wrote:
> I'd prefer not to go there, at least not without a demonstration that
> this will solve a bug that's unsolvable otherwise. If a child is
> really stuck in a state that doesn't accept SIGQUIT, it probably
> won't accept SIGKILL either (eg, uninterruptable disk wait). Or maybe
> we just have some errant code that is blocking SIGQUIT; but that's
> a garden variety bug IMO, not something that needs major new postmaster
> logic to work around.
strace on the backend processes all showed them waiting at
futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL
Notably, the first argument was the same for all of them.
I gather that a futex is a Linux kernel thing, which is probably then
used by glibc to implement some pthreads stuff. Anyone know more?
But yes, using SIGKILL on these processes works without problem.