Re: Improve error reporting in 027_stream_regress test - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Improve error reporting in 027_stream_regress test |
Date | |
Msg-id | 869277.1753793389@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Improve error reporting in 027_stream_regress test (Michael Paquier <michael@paquier.xyz>) |
List | pgsql-hackers |
Michael Paquier <michael@paquier.xyz> writes: > On Tue, Jul 29, 2025 at 12:41:39AM -0400, Tom Lane wrote: >> There is something strange happening on mamba --- not sure what, >> but its cycle time for the past week has been a lot more than normal. > We are getting PQPING_NO_RESPONSE meaning a lack of report activity > from the postmaster. An increase in timeout may help, but the host > seems like it's facing a high workload so it's not really possible to > come with a perfect number, just an estimation. How about adding a > --timeout to pg_isready based on PGCONNECT_TIMEOUT, like in the > attached? Yeah, looking closely at the failure logs [1], the standby postmaster reported pg_isready's connection request in good order at about 08:01:47, but then nothing happened for a few seconds: 2025-07-28 08:01:47.884 EDT [6985:1] [unknown] LOG: connection received: host=[local] 2025-07-28 08:01:48.261 EDT [6985:2] [unknown] LOG: connection authenticated: user="buildfarm" method=trust (/home/buildfarm/bf-data/HEAD/pgsql.build/src/test/recovery/tmp_check/t_027_stream_regress_standby_1_data/pgdata/pg_hba.conf:117) 2025-07-28 08:01:48.261 EDT [6985:3] [unknown] LOG: connection authorized: user=buildfarm database=postgres application_name=027_stream_regress.pl 2025-07-28 08:01:51.552 EDT [6985:4] 027_stream_regress.pl LOG: could not send data to client: Broken pipe 2025-07-28 08:01:51.552 EDT [6985:5] 027_stream_regress.pl FATAL: connection to client lost 2025-07-28 08:01:51.552 EDT [6985:6] 027_stream_regress.pl LOG: disconnection: session time: 0:00:03.670 user=buildfarmdatabase=postgres host=[local] The client side reported failure at 08:01:50 and a bit, matching pg_isready's default 3-second timeout: # Running: pg_isready --host /home/buildfarm/bf-data/tmp/ep8AOH4m7l --port 24780 /home/buildfarm/bf-data/tmp/ep8AOH4m7l:24780 - accepting connections # Running: pg_isready --host /home/buildfarm/bf-data/tmp/ep8AOH4m7l --port 24781 /home/buildfarm/bf-data/tmp/ep8AOH4m7l:24781 - no response [08:01:50.899](1505.313s) ok 2 - regression tests pass [08:01:50.902](0.003s) ok 3 - primary alive after regression test run [08:01:50.905](0.003s) not ok 4 - standby alive after regression test run [08:01:50.908](0.003s) [08:01:50.908](0.000s) # Failed test 'standby alive after regression test run' # at t/027_stream_regress.pl line 104. [08:01:50.909](0.001s) # got: '0' # expected: '1' So I agree that this looks a lot like pg_isready timed out. It's not too surprising if the machine was heavily loaded at that point, since the standby would probably still be replaying the last actions of the regression tests. +1 for lengthening the timeout, as I see you already did. regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2025-07-28%2007%3A46%3A26
pgsql-hackers by date: