Re: interrupted tap tests leave postgres instances around - Mailing list pgsql-hackers

From Andres Freund
Subject Re: interrupted tap tests leave postgres instances around
Date
Msg-id 20221001203821.dpme5lsystrnhe5t@awork3.anarazel.de
Whole thread Raw
In response to Re: interrupted tap tests leave postgres instances around  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: interrupted tap tests leave postgres instances around
List pgsql-hackers
Hi,

On 2022-09-30 11:17:00 +0200, Alvaro Herrera wrote:
> But on testing, some nodes linger after being sent a shutdown signal.
> I'm not clear why this is -- I think it's due to the fact that we send
> the signal just as the node is starting up, which means the signal
> doesn't reach the process.

I suspect it's when a test gets interrupt while pg_ctl is starting the
backend. The start() routine only does _update_pid() after pg_ctl finished,
and terminate()->stop() returns before doing anything if pid isn't defined.

Perhaps the END{} routine should call $node->_update_pid(-1); if $exit_code !=
0 and _pid is undefined?

That does seem to reduce the incidence of "leftover" postgres
instances. 001_start_stop.pl leaves some behind, but that makes sense, because
it's bypassing the whole node management. But I still occasionally see some
remaining processes if I crank up test concurrency.

Ah! At least part of the problem is that sub stop() does BAIL_OUT, and of
course it can fail as part of the shutdown.

But there's still some that survive, where your perl.trace doesn't contain the
node getting shut down...


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: pg_regress: lookup shellprog in $PATH
Next
From: "Jonathan S. Katz"
Date:
Subject: Re: Question: test "aggregates" failed in 32-bit machine