Re: race condition in pg_class - Mailing list pgsql-hackers
| From | Noah Misch |
|---|---|
| Subject | Re: race condition in pg_class |
| Date | |
| Msg-id | 20260216170326.af@rfd.leadboat.com Whole thread Raw |
| In response to | Re: race condition in pg_class (Alexander Lakhin <exclusion@gmail.com>) |
| Responses |
Re: race condition in pg_class
|
| List | pgsql-hackers |
On Mon, Feb 16, 2026 at 08:00:00AM +0200, Alexander Lakhin wrote: > 20.07.2024 11:00, Alexander Lakhin wrote: > > 28.06.2024 08:13, Noah Misch wrote: > > > Pushed. > Could you please look at one more interesting failure produced by > 001_pgbench_with_server.pl [1]? > regress_log_001_pgbench_with_server: > [13:11:27.325](0.001s) ok 3 - concurrent OID generation stderr /(?^:^$)/ > # Running: pgbench ... > [13:11:29.481](2.156s) not ok 4 - concurrent GRANT/VACUUM status (got 2 vs expected 0) # TODO PROC_IN_VACUUM scan breakage > [13:11:29.483](0.002s) # Failed (TODO) test 'concurrent GRANT/VACUUM status (got 2 vs expected 0)' > # at C:/tools/xmsys64/home/pgrunner/bf/root/REL_18_STABLE/pgsql/src/bin/pgbench/t/001_pgbench_with_server.pl line 77. > [13:11:29.484](0.001s) not ok 5 - concurrent GRANT/VACUUM stdout > /(?^:processed: 250/250)/ # TODO PROC_IN_VACUUM scan breakage > [13:11:29.485](0.001s) # Failed (TODO) test 'concurrent GRANT/VACUUM stdout /(?^:processed: 250/250)/' > ... > [13:11:29.486](0.001s) not ok 6 - concurrent GRANT/VACUUM stderr /(?^:^$)/ # TODO PROC_IN_VACUUM scan breakage > [13:11:29.486](0.000s) # Failed (TODO) test 'concurrent GRANT/VACUUM stderr /(?^:^$)/' > # at C:/tools/xmsys64/home/pgrunner/bf/root/REL_18_STABLE/pgsql/src/bin/pgbench/t/001_pgbench_with_server.pl line 77. > [13:11:29.487](0.001s) # 'pgbench: error: client 1 script > 1 aborted in command 0 query 0: ERROR: relation 266643 deleted while still > in use > # pgbench: error: Run was aborted; the above results are incomplete. > # ' > # doesn't match '(?^:^$)' > > 001_pgbench_with_server_main.log contains: > 2026-02-12 13:11:28.603 UTC [6012:36] 001_pgbench_with_server.pl ERROR: relation 266643 deleted while still in use > 2026-02-12 13:11:28.603 UTC [6012:37] 001_pgbench_with_server.pl STATEMENT: VACUUM ddl_target; > > I'm able to reproduce this error with: > numcouples=40 > for ((j=1;j<=numcouples;j++)); do > createdb db$j > echo "CREATE TABLE t(i int);" | psql -d db$j > done > > for ((i=1;i<=1000;i++)); do > echo "iteration $i" > for ((j=1;j<=numcouples;j++)); do > for ((k=1;k<=100;k++)); do echo "GRANT SELECT ON t TO public /* $k */;"; done | psql -d db$j >psql-grant-$j.log 2>&1& > for ((k=1;k<=10;k++)); do echo "VACUUM t /* $k */;"; done | psql -d db$j >psql-vacuum-$j.log 2>&1 & > done > wait > grep -E 'ERROR: ' server.log && break; > done > > This fails for me as below: > ... > iteration 47 > 2026-02-16 07:13:14.855 EET|law|db13|6992a76a.a6983|ERROR: relation 16434 deleted while still in use > > ... > iteration 6 > 2026-02-16 07:13:42.537 EET|law|db20|6992a786.ab3bc|ERROR: pg_class entry for relid 16462 vanished during vacuuming > > ... > iteration 7 > 2026-02-16 07:14:01.182 EET|law|db2|6992a799.ad54f|ERROR: could not open relation with OID 16390 > > ... > iteration 9 > 2026-02-16 07:14:26.160 EET|law|db12|6992a7b2.aedf8|ERROR: relation 16430 deleted while still in use > ... These symptoms are consistent with the "PROC_IN_VACUUM scan breakage" bug. It's good to have an additional recipe for reproducing that bug, so I've linked to your message from the PROC_IN_VACUUM entry at https://wiki.postgresql.org/wiki/User:Nmisch/Wanted > [1] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=fairywren&dt=2026-02-12%2012%3A17%3A56&stg=pgbench-check Since the PROC_IN_VACUUM failures are in tests marked TODO, they don't make the test fail. I think this particular buildfarm link is a case of: https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#Miscellaneous_tests_fail_on_Windows_due_to_a_connection_closed_before_receiving_a_final_error_message Here's the non-TODO failure in that log: [13:11:30.050](0.002s) not ok 10 - no such database stderr /(?^:FATAL: database "no-such-database" does not exist)/ [13:11:30.051](0.001s) # Failed test 'no such database stderr /(?^:FATAL: database "no-such-database" does not exist)/' # at C:/tools/xmsys64/home/pgrunner/bf/root/REL_18_STABLE/pgsql/src/bin/pgbench/t/001_pgbench_with_server.pl line 100. [13:11:30.051](0.000s) # 'pgbench: error: connection to server on socket "C:/tools/xmsys64/tmp/P5YfRmVxhI/.s.PGSQL.10766"failed: server closed the connection unexpectedly # This probably means the server terminated abnormally # before or while processing the request. # pgbench: error: could not create connection for setup # ' # doesn't match '(?^:FATAL: database "no-such-database" does not exist)' The server log has the expected message that pgbench didn't receive: 2026-02-12 13:11:29.662 UTC [8948:1] [unknown] LOG: connection received: host=[local] 2026-02-12 13:11:29.664 UTC [8948:2] [unknown] LOG: connection authenticated: user="pgrunner" method=trust (C:/tools/xmsys64/home/pgrunner/bf/root/REL_18_STABLE/pgsql.build/testrun/pgbench/001_pgbench_with_server/data/t_001_pgbench_with_server_main_data/pgdata/pg_hba.conf:117) 2026-02-12 13:11:29.664 UTC [8948:3] [unknown] LOG: connection authorized: user=pgrunner database=no-such-database application_name=001_pgbench_with_server.pl 2026-02-12 13:11:29.664 UTC [8948:4] [unknown] FATAL: database "no-such-database" does not exist
pgsql-hackers by date: