Re: race condition in pg_class - Mailing list pgsql-hackers

From Noah Misch
Subject Re: race condition in pg_class
Date
Msg-id 20260216170326.af@rfd.leadboat.com
Whole thread Raw
In response to Re: race condition in pg_class  (Alexander Lakhin <exclusion@gmail.com>)
Responses Re: race condition in pg_class
List pgsql-hackers
On Mon, Feb 16, 2026 at 08:00:00AM +0200, Alexander Lakhin wrote:
> 20.07.2024 11:00, Alexander Lakhin wrote:
> > 28.06.2024 08:13, Noah Misch wrote:
> > > Pushed.

> Could you please look at one more interesting failure produced by
> 001_pgbench_with_server.pl [1]?
> regress_log_001_pgbench_with_server:
> [13:11:27.325](0.001s) ok 3 - concurrent OID generation stderr /(?^:^$)/
> # Running: pgbench ...
> [13:11:29.481](2.156s) not ok 4 - concurrent GRANT/VACUUM status (got 2 vs expected 0) # TODO PROC_IN_VACUUM scan
breakage
> [13:11:29.483](0.002s) #   Failed (TODO) test 'concurrent GRANT/VACUUM status (got 2 vs expected 0)'
> #   at C:/tools/xmsys64/home/pgrunner/bf/root/REL_18_STABLE/pgsql/src/bin/pgbench/t/001_pgbench_with_server.pl line
77.
> [13:11:29.484](0.001s) not ok 5 - concurrent GRANT/VACUUM stdout
> /(?^:processed: 250/250)/ # TODO PROC_IN_VACUUM scan breakage
> [13:11:29.485](0.001s) #   Failed (TODO) test 'concurrent GRANT/VACUUM stdout /(?^:processed: 250/250)/'
> ...
> [13:11:29.486](0.001s) not ok 6 - concurrent GRANT/VACUUM stderr /(?^:^$)/ # TODO PROC_IN_VACUUM scan breakage
> [13:11:29.486](0.000s) #   Failed (TODO) test 'concurrent GRANT/VACUUM stderr /(?^:^$)/'
> #   at C:/tools/xmsys64/home/pgrunner/bf/root/REL_18_STABLE/pgsql/src/bin/pgbench/t/001_pgbench_with_server.pl line
77.
> [13:11:29.487](0.001s) #                   'pgbench: error: client 1 script
> 1 aborted in command 0 query 0: ERROR:  relation 266643 deleted while still
> in use
> # pgbench: error: Run was aborted; the above results are incomplete.
> # '
> #     doesn't match '(?^:^$)'
> 
> 001_pgbench_with_server_main.log contains:
> 2026-02-12 13:11:28.603 UTC [6012:36] 001_pgbench_with_server.pl ERROR:  relation 266643 deleted while still in use
> 2026-02-12 13:11:28.603 UTC [6012:37] 001_pgbench_with_server.pl STATEMENT:  VACUUM ddl_target;
> 
> I'm able to reproduce this error with:
> numcouples=40
> for ((j=1;j<=numcouples;j++)); do
> createdb db$j
> echo "CREATE TABLE t(i int);" | psql -d db$j
> done
> 
> for ((i=1;i<=1000;i++)); do
>   echo "iteration $i"
>   for ((j=1;j<=numcouples;j++)); do
>     for ((k=1;k<=100;k++)); do echo "GRANT SELECT ON t TO public /* $k */;"; done | psql -d db$j >psql-grant-$j.log
2>&1&
 
>     for ((k=1;k<=10;k++)); do echo "VACUUM t /* $k */;"; done | psql -d db$j >psql-vacuum-$j.log 2>&1 &
>   done
>   wait
>   grep -E 'ERROR: ' server.log && break;
> done
> 
> This fails for me as below:
> ...
> iteration 47
> 2026-02-16 07:13:14.855 EET|law|db13|6992a76a.a6983|ERROR:  relation 16434 deleted while still in use
> 
> ...
> iteration 6
> 2026-02-16 07:13:42.537 EET|law|db20|6992a786.ab3bc|ERROR:  pg_class entry for relid 16462 vanished during vacuuming
> 
> ...
> iteration 7
> 2026-02-16 07:14:01.182 EET|law|db2|6992a799.ad54f|ERROR:  could not open relation with OID 16390
> 
> ...
> iteration 9
> 2026-02-16 07:14:26.160 EET|law|db12|6992a7b2.aedf8|ERROR:  relation 16430 deleted while still in use
> ...

These symptoms are consistent with the "PROC_IN_VACUUM scan breakage" bug.
It's good to have an additional recipe for reproducing that bug, so I've
linked to your message from the PROC_IN_VACUUM entry at
https://wiki.postgresql.org/wiki/User:Nmisch/Wanted

> [1]
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=fairywren&dt=2026-02-12%2012%3A17%3A56&stg=pgbench-check

Since the PROC_IN_VACUUM failures are in tests marked TODO, they don't make
the test fail.  I think this particular buildfarm link is a case of:

https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#Miscellaneous_tests_fail_on_Windows_due_to_a_connection_closed_before_receiving_a_final_error_message

Here's the non-TODO failure in that log:

[13:11:30.050](0.002s) not ok 10 - no such database stderr /(?^:FATAL:  database "no-such-database" does not exist)/
[13:11:30.051](0.001s) #   Failed test 'no such database stderr /(?^:FATAL:  database "no-such-database" does not
exist)/'
#   at C:/tools/xmsys64/home/pgrunner/bf/root/REL_18_STABLE/pgsql/src/bin/pgbench/t/001_pgbench_with_server.pl line
100.
[13:11:30.051](0.000s) #                   'pgbench: error: connection to server on socket
"C:/tools/xmsys64/tmp/P5YfRmVxhI/.s.PGSQL.10766"failed: server closed the connection unexpectedly
 
#     This probably means the server terminated abnormally
#     before or while processing the request.
# pgbench: error: could not create connection for setup
# '
#     doesn't match '(?^:FATAL:  database "no-such-database" does not exist)'


The server log has the expected message that pgbench didn't receive:

2026-02-12 13:11:29.662 UTC [8948:1] [unknown] LOG:  connection received: host=[local]
2026-02-12 13:11:29.664 UTC [8948:2] [unknown] LOG:  connection authenticated: user="pgrunner" method=trust
(C:/tools/xmsys64/home/pgrunner/bf/root/REL_18_STABLE/pgsql.build/testrun/pgbench/001_pgbench_with_server/data/t_001_pgbench_with_server_main_data/pgdata/pg_hba.conf:117)
2026-02-12 13:11:29.664 UTC [8948:3] [unknown] LOG:  connection authorized: user=pgrunner database=no-such-database
application_name=001_pgbench_with_server.pl
2026-02-12 13:11:29.664 UTC [8948:4] [unknown] FATAL:  database "no-such-database" does not exist



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH] pgindent truncates last line of files missing a trailing newline
Next
From: Nathan Bossart
Date:
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD