Re: Flaky 003_start_stop.pl test - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Flaky 003_start_stop.pl test |
Date | |
Msg-id | rm3wawhaebopcncwwadnenypl7mwwqacoqj2u35qi36d5zcms2@wbbcb6t6ukf4 Whole thread Raw |
In response to | Re: Flaky 003_start_stop.pl test (Thomas Munro <thomas.munro@gmail.com>) |
List | pgsql-hackers |
Hi, On 2025-03-04 09:48:45 +1300, Thomas Munro wrote: > On Tue, Mar 4, 2025 at 5:48 AM Andres Freund <andres@anarazel.de> wrote: > > On 2025-02-24 11:26:56 +0100, Jelte Fennema-Nio wrote: > > > [1]: https://cirrus-ci.com/task/5571017969500160?logs=test_world#L256 > > > > A possibly related test failure is: > > > > https://cirrus-ci.com/task/5005337046024192?logs=check_world#L259 > > They look different to me: Thanks for checking! > (1) In Jelte's report for 003_start_stop, we see: > > [03:40:42.347](2.022s) ok 1 - dead-end connection 0 > [03:40:42.348](0.001s) ok 2 - dead-end connection 1 > [03:40:42.349](0.001s) ok 3 - dead-end connection 2 > ... up to 20 ... > > We see in the server log that the first one had pid 31660 and slot 1, > but it never said "SSLRequest rejected" like all the others. The Perl > code must have received "N", or it would not have moved onto opening > the next connection. Then just as it is about to try to connect, > hoping to see "too many clients", that one exits, the "invalid_role" > connection gets slot 1 instead of the dead-end backend we expected: > > 03:40:42.375 UTC [31645][postmaster] DEBUG: client backend (PID > 31660) exited with exit code 0 <-- goodbye, silent backend > 03:40:42.480 UTC [31681][not initialized] [[unknown]][:0] LOG: > connection received: host=[local] > 03:40:42.480 UTC [31681][not initialized] [[unknown]][:0] LOG: > SSLRequest rejected > 03:40:42.511 UTC [31645][postmaster] DEBUG: assigned pm child slot 1 > for backend <-- we get silent backend's slot > 03:40:42.512 UTC [31645][postmaster] DEBUG: forked new client > backend, pid=31690 socket=7 > 03:40:42.542 UTC [31690][not initialized] [[unknown]][:0] LOG: > connection received: host=[local] > 03:40:42.543 UTC [31690][client backend] [[unknown]][0/1:0] LOG: > connection authenticated: user="invalid_user" method=trust > (/tmp/cirrus-ci-build/build/testrun/postmaster/003_start_stop/data/t_003_start_stop_main_data/pgdata/pg_hba.conf:117) > 03:40:42.543 UTC [31690][client backend] [[unknown]][0/1:0] LOG: > connection authorized: user=invalid_user database=postgres > application_name=003_start_stop.pl > 03:40:42.543 UTC [31690][client backend] [[unknown]][0/1:0] FATAL: > role "invalid_user" does not exist Good catch. The pattern is similar in other failed tests. There has been a spate of errors on postgres/postgres' CI, which made me look at this again. > Could the backend have sent an "N" (NoticeResponse) unrelated to SSL > negotiation? I don't think that'd be legal protocol flow from the server side. And, given that the test uses log_min_messages=debug2, should have been logged in the server log. All the recent failures were on FreeBSD, interestingly. I wonder if we should, at least temporarily, increase the test's log level? I also suspect that it'd be worthwhile to add a few more if (Trace_connection_negotiation) messages. Greetings, Andres Freund
pgsql-hackers by date: