Hi,
On 2022-09-30 11:17:00 +0200, Alvaro Herrera wrote:
> But on testing, some nodes linger after being sent a shutdown signal.
> I'm not clear why this is -- I think it's due to the fact that we send
> the signal just as the node is starting up, which means the signal
> doesn't reach the process.
I suspect it's when a test gets interrupt while pg_ctl is starting the
backend. The start() routine only does _update_pid() after pg_ctl finished,
and terminate()->stop() returns before doing anything if pid isn't defined.
Perhaps the END{} routine should call $node->_update_pid(-1); if $exit_code !=
0 and _pid is undefined?
That does seem to reduce the incidence of "leftover" postgres
instances. 001_start_stop.pl leaves some behind, but that makes sense, because
it's bypassing the whole node management. But I still occasionally see some
remaining processes if I crank up test concurrency.
Ah! At least part of the problem is that sub stop() does BAIL_OUT, and of
course it can fail as part of the shutdown.
But there's still some that survive, where your perl.trace doesn't contain the
node getting shut down...
Greetings,
Andres Freund