Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> I'm not sure whether there's anything much we can do to prevent such
>> problems in future. Maybe it'd be reasonable for pg_regress to do a
>> kill -9 on its postmaster child process if it gives up waiting for the
>> postmaster to accept connections.
> I'm trying to think how we could harden the buildfarm script to avoid
> such situations, although I am so far without any great revelations.
> The idea of getting pg_regress to send a signal isn't bad - what if the
> PID gets reused, since we know not all systems allocate PIDs in a
> cyclical fashion?
I think it'd be OK on Unix --- even if the PID has been reused by the
time pg_regress tries to kill the child, presumably the reuse would be
under a different userid and pg_regress wouldn't have permission to kill
it.
I am not clear on how to do something equivalent under Windows though.
We'd have a HANDLE not a PID coming back from spawn_process, so I
suppose there should not be a confusion-of-identity problem, but I don't
know what the syscall equivalent to "kill(pid, SIGKILL)" would be.
Another problem is that under Unix we will have the exact postmaster PID
to try to kill(), because (a) spawn_process uses execl() not system() to
invoke the sub-shell and (b) we tell the sub-shell to exec not just call
the postmaster. I think under Windows we probably have a HANDLE for an
instance of the command line processor, not the postmaster as such, and
so I'm worried that killing it would not kill the postmaster anyway.
Does Windows have a syscall that would say "kill this process and all
its children too"?
It may be worth doing the SIGKILL on Unix even if we don't have a
solution for Windows, but it'd be nice if to have a solution for
the Windows port too.
regards, tom lane