Hi,
14.03.2023 01:20, Andres Freund wrote:
>> I am yet to construct a reproduction of the case, but it seems to me that
>> the race condition is not impossible here.
> I suspect the issue could be made much more likely by adding a sleep before
> the pg_queue_signal(SIGCHLD) in pgwin32_deadchild_callback().
Thanks for the tip! With pg_usleep(50000) added there, I can reproduce the issue
reliably during a minute on average with the 099_check_pids.pl I posted before:
...
2023-03-15 07:26:14.301 GMT|[unknown]|[unknown]|3748|64117316.ea4|LOG:
connection received: host=127.0.0.1 port=49902
2023-03-15 07:26:14.302 GMT|postgres|postgres|3748|64117316.ea4|LOG: connection
authorized: user=postgres database=postgres application_name=099_check-pids.pl
2023-03-15 07:26:14.304 GMT|postgres|postgres|3748|64117316.ea4|LOG: statement:
SELECT pg_backend_pid()
2023-03-15 07:26:14.305 GMT|postgres|postgres|3748|64117316.ea4|LOG:
disconnection: session time: 0:00:00.005 user=postgres database=postgres
host=127.0.0.1 port=49902
...
2023-03-15 07:26:25.592 GMT|[unknown]|[unknown]|3748|64117321.ea4|LOG:
connection received: host=127.0.0.1 port=50407
TRAP: failed Assert("PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED"),
File: "C:\src\postgresql\src\backend\storage\ipc\pmsignal.c", Line: 329, PID: 3748
abort() has been called2023-03-15 07:26:25.608
GMT|[unknown]|[unknown]|3524|64117321.dc4|LOG: connection received:
host=127.0.0.1 port=50408
The result depends on some OS conditions (it reproduced pretty well
immediately after VM reboot), but it's enough to test the patch proposed.
And I can confirm that the Assert is not observed anymore (with the sleep
added after CloseHandle(childinfo->procHandle)).
Best regards,
Alexander