Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED - Mailing list pgsql-hackers
From | Alexander Lakhin |
---|---|
Subject | Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED |
Date | |
Msg-id | 4245b0de-9367-c443-8145-7b62b0ab3c41@gmail.com Whole thread Raw |
In response to | Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED (Andres Freund <andres@anarazel.de>) |
Responses |
Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED
|
List | pgsql-hackers |
18.02.2023 23:09, Andres Freund wrote: > Hi, > > On 2023-02-18 18:00:00 +0300, Alexander Lakhin wrote: >> 18.02.2023 04:06, Andres Freund wrote: >> Maybe it's just the backend started for the money test has got >> the same PID (5948) that the backend for the name test had? > I somehow mashed name and money into one test in my head... So forget what I > wrote. > > That doesn't really explain the assertion though. > ,,, > It's definitely very aggressive in reusing pids - and it seems to > intentionally do work to keep pids small. I wonder if it'd be worth trying to > exercise this path aggressively by configuring a very low max pid on linux, in > an EXEC_BACKEND build. For information, I performed the simple analysis of the postmaster.log after ordinary `vcregress check`: grep -o -E 'client backend\[[0-9]+\] pg_regress/\w+' ..\..\..\src\test\regress\log\postmaster.log | sort | uniq | wc -l grep -o -E 'client backend\[[0-9]+\] pg_regress' ..\..\..\src\test\regress\log\postmaster.log | sort | uniq | wc -l and got the following numbers: ... iteration 88 196 174 iteration 89 196 170 iteration 90 196 176 iteration 91 196 175 Then I performed some more experiments and came to the conclusion that `vcregress check` is not very suitable for catching duplicate pids. So I've wrote a simple TAP test and caught the condition sought: t/099_check_pids.pl .. # 0 # 1 # 2 # 3 # 4 # 5 # 6 # Thread 15 failed: # Got two equal pids in a row: 3552 on iteration 30 # 1 t/099_check_pids.pl .. 1/1 # Failed test at t/099_check_pids.pl line 107. ... server log contains: ... 2023-02-20 06:33:16.142 PST|[unknown]|[unknown]|5704|63f384ac.1648|LOG: connection received: host=127.0.0.1 port=55134 2023-02-20 06:33:16.142 PST|[unknown]|[unknown]|3552|63f384ac.de0|LOG: connection received: host=127.0.0.1 port=55138 2023-02-20 06:33:16.144 PST|postgres|postgres|5704|63f384ac.1648|LOG: connection authorized: user=postgres database=postgres application_name=099_check_pids.pl 2023-02-20 06:33:16.144 PST|postgres|postgres|3552|63f384ac.de0|LOG: connection authorized: user=postgres database=postgres application_name=099_check_pids.pl 2023-02-20 06:33:16.147 PST|postgres|postgres|5704|63f384ac.1648|LOG: statement: SELECT pg_backend_pid() 2023-02-20 06:33:16.147 PST|postgres|postgres|3552|63f384ac.de0|LOG: statement: SELECT pg_backend_pid() 2023-02-20 06:33:16.147 PST|postgres|postgres|5704|63f384ac.1648|LOG: disconnection: session time: 0:00:00.009 user=postgres database=postgres host=127.0.0.1 port=55134 2023-02-20 06:33:16.147 PST|postgres|postgres|3552|63f384ac.de0|LOG: disconnection: session time: 0:00:00.008 user=postgres database=postgres host=127.0.0.1 port=55138 2023-02-20 06:33:16.158 PST|[unknown]|[unknown]|1672|63f384ac.688|LOG: connection received: host=127.0.0.1 port=55139 ... 2023-02-20 06:33:16.485 PST|postgres|postgres|2748|63f384ac.abc|LOG: connection authorized: user=postgres database=postgres application_name=099_check_pids.pl 2023-02-20 06:33:16.486 PST|[unknown]|[unknown]|3552|63f384ac.de0|LOG: connection received: host=127.0.0.1 port=55164 2023-02-20 06:33:16.487 PST|postgres|postgres|3552|63f384ac.de0|LOG: connection authorized: user=postgres database=postgres application_name=099_check_pids.pl 2023-02-20 06:33:16.488 PST|postgres|postgres|2748|63f384ac.abc|LOG: statement: SELECT pg_backend_pid() 2023-02-20 06:33:16.489 PST|postgres|postgres|2748|63f384ac.abc|LOG: disconnection: session time: 0:00:00.007 user=postgres database=postgres host=127.0.0.1 port=55163 2023-02-20 06:33:16.490 PST|postgres|postgres|3552|63f384ac.de0|LOG: statement: SELECT pg_backend_pid() 2023-02-20 06:33:16.491 PST|postgres|postgres|3552|63f384ac.de0|LOG: disconnection: session time: 0:00:00.008 user=postgres database=postgres host=127.0.0.1 port=55164 2023-02-20 06:33:16.503 PST|[unknown]|[unknown]|244|63f384ac.f4|LOG: connection received: host=127.0.0.1 port=55162 ... (note that even session IDs are the same) Though I've got no that assert yet, thus maybe as you said, the pid duplication is not the (only) condition that triggered it (or maybe something is wrong on my side). Interestingly, but I can get the expected duplicates only if I start the VS prompt and run the test immediately after a logon (or VM reboot). After some activity (or may be some time), the test can run for 30 minutes without success (maybe it depends on OS cache...). Also with a similar TAP test I discovered that on Linux (Debian 11 32-bit) pids are generated sequentially: # pid: 329 # pid: 331 # pid: 333 # pid: 336 # pid: 338 # pid: 340 # pid: 343 # pid: 345 # pid: 349 # pid: 353 So with an extra small max_pid (<360) I just get "could not fork new process for connection: Resource temporarily unavailable'" before two processes, that started one after another, get the same pid. But on Windows the sequence looks random: # pid: 1736 # pid: 8168 # pid: 3764 # pid: 7180 # pid: 3724 # pid: 5372 # pid: 1588 # pid: 4188 # pid: 1404 # pid: 5280 Best regards, Alexander
Attachment
pgsql-hackers by date: