BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used
Date
Msg-id 18014-28c81cb79d44295d@postgresql.org
Whole thread Raw
Responses Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18014
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 16beta2
Operating system:   Ubuntu 22.04
Description:

Yesterday's test failure on prion:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2023-07-03%2010%3A13%3A03
made me wonder, what's going on there and whether it's yet another issue
with invalidating relcache (bug #17994).
(
SELECT schema_to_xmlschema('testxmlschema', false, true, '');
ERROR:  relation with OID 29598 does not exist
CONTEXT:  SQL statement "SELECT oid FROM pg_catalog.pg_class WHERE
relnamespace = 29597 AND relkind IN ('r','m','v') AND
pg_catalog.has_table_privilege (oid, 'SELECT') ORDER BY relname;"

Other failures of that kind:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=prion&dt=2023-06-20%2001%3A56%3A04&stg=check
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=prion&dt=2023-04-15%2017%3A17%3A09&stg=check
)

I managed to construct a simple reproducer for the error:
for ((n=1;n<=30;n++)); do
echo "ITERATION $n"

numclients=30
for ((c=1;c<=$numclients;c++)); do
cat << EOF | psql >psql_$c.log &
CREATE SCHEMA testxmlschema_$c;

SELECT format('CREATE TABLE testxmlschema_$c.test_%s (a int);', g) FROM
generate_series(1, 30) g
\\gexec

SET parallel_setup_cost = 1;
SET min_parallel_table_scan_size = '1kB';
SELECT schema_to_xmlschema('testxmlschema_$c', true, false, '');

SELECT format('DROP TABLE testxmlschema_$c.test_%s', g) FROM
generate_series(1, 30) g
\\gexec

DROP SCHEMA testxmlschema_$c;
EOF
done
wait
grep 'ERROR:' server.log && break;
done

With a server compiled as follows:
CPPFLAGS="-O0 -DCATCACHE_FORCE_RELEASE" ./configure -q --enable-debug
--enable-cassert --enable-tap-tests --with-libxml && make ...
(More precisely, "#ifndef CATCACHE_FORCE_RELEASE" in ReleaseCatCache()
does matter here.)

I get errors as in the test in question:
...
ITERATION 9
ITERATION 10
ERROR:  relation with OID 59777 does not exist
CONTEXT:  parallel worker
SQL statement "SELECT oid FROM pg_catalog.pg_class WHERE relnamespace =
57162 AND relkind IN ('r','m','v') AND pg_catalog.has_table_privilege (oid,
'SELECT') ORDER BY relname;"
2023-07-04 12:48:14.205 MSK [3111661] ERROR:  relation with OID 59777 does
not exist
2023-07-04 12:48:14.206 MSK [3111598] ERROR:  relation with OID 59777 does
not exist

With a debug logging added in src/backend/utils/adt/acl.c, I see that
SearchSysCacheExists1(RELOID, ObjectIdGetDatum(tableoid) returns true in
has_table_privilege_id(), but later, in
pg_class_aclcheck()/pg_class_aclmask_ext(), 
SearchSysCache1(RELOID, ObjectIdGetDatum(table_oid)) returns NULL.

This is reproduced on REL_10_STABLE .. master. 
The first commit that demonstrates the issue is 61c2e1a95 (it improved
access to parallelism for SPI users, one of which is
schema_to_xmlschema_internal() (see also schema_get_xml_visible_tables())).


pgsql-bugs by date:

Previous
From: Sandeep Thakkar
Date:
Subject: Re: BUG #18012: Installer fails to run .bat files when they are registered to Notepad++
Next
From: Tomas Vondra
Date:
Subject: Re: Backend handling replication slot stuck using 100% cpu, unkillable