Race conditions in has_table_privilege() and friends - Mailing list pgsql-hackers

From Tom Lane
Subject Race conditions in has_table_privilege() and friends
Date
Msg-id 2051732.1681588069@sss.pgh.pa.us
Whole thread Raw
List pgsql-hackers
I noticed that prion recently failed [1] with

 SELECT schema_to_xml_and_xmlschema('testxmlschema', true, true, 'foo');
...
+ERROR:  relation with OID 30019 does not exist
+CONTEXT:  SQL statement "SELECT oid FROM pg_catalog.pg_class WHERE relnamespace = 29949 AND relkind IN ('r','m','v')
ANDpg_catalog.has_table_privilege (oid, 'SELECT') ORDER BY relname;" 

What evidently happened here is that has_table_privilege got applied
to some relation (in a schema different from 'testxmlschema', which
should be stable here) that was in the middle of getting dropped.
This'd require the relnamespace condition to get applied after the
has_table_privilege condition, which is quite possible (although it seems
to require that auto-analyze update pg_class's statistics while this
test runs).  Even then, has_table_privilege is supposed to survive the
situation, but it has a race condition:

    if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(tableoid)))
        PG_RETURN_NULL();

    aclresult = pg_class_aclcheck(tableoid, roleid, mode);

The fact that SearchSysCacheExists1 succeeds doesn't guarantee that
when pg_class_aclcheck fetches the same row a moment later, it'll
still be there.  The race-condition window is pretty narrow (and
indeed I don't think we've seen this buildfarm symptom before),
but it exists.

Now, it looks to me like this case is trivial to fix, using the
pg_class_aclcheck_ext function introduced in b12bd4869 (in v14).
We just need to drop the separate SearchSysCacheExists1 call and do

    aclresult = pg_class_aclcheck_ext(tableoid, roleid, mode, &is_missing);

which should be a trifle faster as well as safer.

However, to cover the remaining risk spots in acl.c, it looks like
we'd need "_ext" versions of pg_attribute_aclcheck_all and
object_aclcheck, which were not introduced by b12bd4869.
object_aclcheck_ext in particular looks like it'd be a bit invasive.

What I'm inclined to do for now is clean up the table-related cases
and leave the code paths using object_aclcheck for another day.
We've always been much more concerned about DDL race conditions for
tables than other kinds of objects, so this approach seems to fit
with past decisions.  I haven't written any code yet, but this looks
like it might amount to a couple hundred lines of fairly simple
changes.

I wonder if we should consider this a bug and back-patch to v14,
or maybe HEAD only; or is it okay to let it slide to v17?

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2023-04-15%2017%3A17%3A09



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Direct I/O
Next
From: Daniel Gustafsson
Date:
Subject: Re: Should vacuum process config file reload more often