Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used - Mailing list pgsql-bugs

From Alexander Lakhin
Subject Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used
Date
Msg-id 5df89a35-a985-b287-7210-6ed7d2c31803@gmail.com
Whole thread Raw
In response to Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used
List pgsql-bugs
Hello Tom,

13.10.2023 00:01, Tom Lane wrote:
> I came back to this question today, and after more thought I'm going
> to cast my vote for "let's not trust SearchSysCacheExists()+lookup".
> It just seems like that's too fragile.  The case reported in this
> thread of it failing in parallel workers but not main should scare
> anybody, and the future course of development seems far more likely
> to introduce new problems than remove them.

Thanks for digging into this!

> I spent some time looking through existing SearchSysCacheExists calls,
> and I could only find two sets of routines where we seem to be
> depending on SearchSysCacheExists to protect a subsequent lookup
> somewhere else, and there isn't any lock on the object in question.
> Those are the has_foo_privilege functions discussed here, and the
> foo_is_visible functions near the bottom of namespace.c.  I'm not
> sure why we've not heard complaints traceable to the foo_is_visible
> family.  Maybe nobody has tried hard to break them, or maybe they
> are just less likely to be used in ways that are at risk.

I'll try to research/break xxx_is_visible and share my findings tomorrow.

> It turns out to not be that hard to get rid of the problem in the
> has_foo_privilege family, as per attached patches.  I've not looked
> at fixing the foo_is_visible family, but that probably ought to be a
> separate patch anyway.

Yeah, the attached patches greatly improve consistency. The only
inconsistency I've found in the patches is a missing comma here:
+ * Exported generic routine for checking a user's access privileges to an object
+ * with is_missing

You removed "this case is not a user-facing error, so elog not ereport",
and I think it's good for consistency too, as all aclmask/aclcheck
functions now use ereport().

> BTW, while nosing around I found what seems like a very nasty related
> bug.  Suppose that a catalog tuple being loaded into syscache contains
> some toasted fields.  CatalogCacheCreateEntry will flatten the tuple,
> involving fetches from toast tables that will certainly cause
> AcceptInvalidationMessages calls.  What if one of those should have
> invalidated this tuple?  We will not notice, because it's not in
> the hashtable yet.  When we do add it, we will mark it not-dead,
> meaning that the stale entry looks fine and could persist for a long
> while.

Yeah, it's an interesting case that needs investigation, IMO.
I'll try to look into this and construct the test case in the background.

> 0001's changes in acl.c are straightforward, but it's worth noting
> that the has_sequence_privilege functions hadn't gotten the memo
> about not failing when a bogus relation OID is passed.

I've looked through all functions has_*priv*_id and all they have
"if (is_missing) PG_RETURN_NULL()" now (with the patches applied).

Best regards,
Alexander



pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #18156: Self-referential foreign key in partitioned table not enforced on deletes
Next
From: Andres Freund
Date:
Subject: Re: BUG #18130: \copy fails with "could not read block" or "page should be empty but not" errors due to triggers