Re: race condition in pg_class - Mailing list pgsql-hackers

From Robert Haas
Subject Re: race condition in pg_class
Date
Msg-id CA+TgmobfMU5pdXP36D5iAwxV5WKE_vuDLtp_1QyH+H5jMMt21g@mail.gmail.com
Whole thread Raw
In response to Re: race condition in pg_class  (Noah Misch <noah@leadboat.com>)
Responses Re: race condition in pg_class
Re: race condition in pg_class
List pgsql-hackers
On Wed, Jun 5, 2024 at 2:17 PM Noah Misch <noah@leadboat.com> wrote:
> Starting 2024-06-10, I plan to push the first seven of the ten patches:
>
> inplace005-UNEXPECTEDPASS-tap-meson-v1.patch
> inplace010-tests-v1.patch
> inplace040-waitfuncs-v1.patch
> inplace050-tests-inj-v1.patch
> inplace060-nodeModifyTable-comments-v1.patch
>   Those five just deal in tests, test infrastructure, and comments.
> inplace070-rel-locks-missing-v1.patch
>   Main risk is new DDL deadlocks.
> inplace080-catcache-detoast-inplace-stale-v1.patch
>   If it fails to fix the bug it targets, I expect it's a no-op rather than
>   breaking things.
>
> I'll leave the last three of the ten needing review.  Those three are beyond
> my skill to self-certify.

It's not this patch set's fault, but I'm not very pleased to see that
the injection point wait events have been shoehorned into the
"Extension" category - which they are not - instead of being a new
wait_event_type. That would have avoided the ugly wait-event naming
pattern, inconsistent with everything else, introduced by
inplace050-tests-inj-v1.patch.

I think that the comments and commit messages in this patch set could,
in some places, use improvement. For instance,
inplace060-nodeModifyTable-comments-v1.patch reflows a bunch of
comments, which makes it hard to see what actually changed, and the
commit message doesn't tell you, either. A good bit of it seems to be
changing "a view" to "a view INSTEAD OF trigger" or "a view having an
INSTEAD OF trigger," but the reasoning behind that change is not
spelled out anywhere. The reader is left to guess what the other case
is and why the same principles don't apply to it. I don't doubt that
the new comments are more correct than the old ones, but I expect
future patch authors to have difficulty maintaining that state of
affairs.

Similarly, inplace070-rel-locks-missing-v1.patch adds no comments.
IMHO, the commit message also isn't very informative. It disclaims
knowledge of what bug it's fixing, while at the same time leaving the
reader to figure out for themselves how the behavior has changed.
Consequently, I expect writing the release notes for a release
including this patch to be difficult: "We added some locks that block
... something ... in some circumstances ... to prevent ... something."
It's not really the job of the release note author to fill in those
blanks, but rather of the patch author or committer. I don't want to
overburden the act of fixing bugs, but I just feel like more
explanation is needed here. When I see for example that we're adding a
lock acquisition to the end of heap_create(), I can't help but wonder
if it's really true that we don't take a lock on a just-created
relation today. I'm certainly under the impression that we lock
newly-created, uncommitted relations, and a quick test seems to
confirm that. I don't quite know whether that happens, but evidently
this call is guarding against something more subtle than a categorical
failure to lock a relation on creation so I think there should be a
comment explaining what that thing is.

It's also quite surprising that SetRelationHasSubclass() says "take X
lock before calling" and 2 of 4 callers just don't. I guess that's how
it is. But shouldn't we then have an assertion inside that function to
guard against future mistakes? If the reason why we failed to add this
initially is discernible from the commit messages that introduced the
bug, it would be nice to mention what it seems to have been; if not,
it would at least be nice to mention the offending commit(s). I'm also
a bit worried that this is going to cause deadlocks, but I suppose if
it does, that's still better than the status quo.

IsInplaceUpdateOid's header comment says IsInplaceUpdateRelation
instead of IsInplaceUpdateOid.

inplace080-catcache-detoast-inplace-stale-v1.patch seems like another
place where spelling out the rationale in more detail would be helpful
to future readers; for instance, the commit message says that
PgDatabaseToastTable is the only one affected, but it doesn't say why
the others are not, or why this one is. The lengthy comment in
CatalogCacheCreateEntry is also difficult to correlate with the code
which follows. I can't guess whether the two cases called out in the
comment always needed to be handled and were handled save only for
in-place updates, and thus the comment changes were simply taking the
opportunity to elaborate on the existing comments; or whether one of
those cases is preexisting and the other arises from the desire to
handle inplace updates. It could be helpful to mention relevant
identifiers from the code in the comment text e.g.
"systable_recheck_tuple detects ordinary updates by noting changes to
the tuple's visibility information, while the equalTuple() case
detects inplace updates."

IMHO, this patch set underscores the desirability of removing in-place
update altogether. That sounds difficult and not back-patchable, but I
can't classify what this patch set does as anything better than grotty
hacks to work around serious design deficiencies. That is not a vote
against these patches: I see no better way forward. Nonetheless, I
dislike the lack of better options.

I have done only cursory review of the last two patches and don't feel
I'm in a place to certify them, at least not now.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Compress ReorderBuffer spill files using LZ4
Next
From: Ashutosh Sharma
Date:
Subject: Re: How about using dirty snapshots to locate dependent objects?