I noticed that prion recently failed [1] with
SELECT schema_to_xml_and_xmlschema('testxmlschema', true, true, 'foo');
...
+ERROR: relation with OID 30019 does not exist
+CONTEXT: SQL statement "SELECT oid FROM pg_catalog.pg_class WHERE relnamespace = 29949 AND relkind IN ('r','m','v')
ANDpg_catalog.has_table_privilege (oid, 'SELECT') ORDER BY relname;"
What evidently happened here is that has_table_privilege got applied
to some relation (in a schema different from 'testxmlschema', which
should be stable here) that was in the middle of getting dropped.
This'd require the relnamespace condition to get applied after the
has_table_privilege condition, which is quite possible (although it seems
to require that auto-analyze update pg_class's statistics while this
test runs). Even then, has_table_privilege is supposed to survive the
situation, but it has a race condition:
if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(tableoid)))
PG_RETURN_NULL();
aclresult = pg_class_aclcheck(tableoid, roleid, mode);
The fact that SearchSysCacheExists1 succeeds doesn't guarantee that
when pg_class_aclcheck fetches the same row a moment later, it'll
still be there. The race-condition window is pretty narrow (and
indeed I don't think we've seen this buildfarm symptom before),
but it exists.
Now, it looks to me like this case is trivial to fix, using the
pg_class_aclcheck_ext function introduced in b12bd4869 (in v14).
We just need to drop the separate SearchSysCacheExists1 call and do
aclresult = pg_class_aclcheck_ext(tableoid, roleid, mode, &is_missing);
which should be a trifle faster as well as safer.
However, to cover the remaining risk spots in acl.c, it looks like
we'd need "_ext" versions of pg_attribute_aclcheck_all and
object_aclcheck, which were not introduced by b12bd4869.
object_aclcheck_ext in particular looks like it'd be a bit invasive.
What I'm inclined to do for now is clean up the table-related cases
and leave the code paths using object_aclcheck for another day.
We've always been much more concerned about DDL race conditions for
tables than other kinds of objects, so this approach seems to fit
with past decisions. I haven't written any code yet, but this looks
like it might amount to a couple hundred lines of fairly simple
changes.
I wonder if we should consider this a bug and back-patch to v14,
or maybe HEAD only; or is it okay to let it slide to v17?
regards, tom lane
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2023-04-15%2017%3A17%3A09