Re: race condition in pg_class - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: race condition in pg_class |
Date | |
Msg-id | CA+TgmobfMU5pdXP36D5iAwxV5WKE_vuDLtp_1QyH+H5jMMt21g@mail.gmail.com Whole thread Raw |
In response to | Re: race condition in pg_class (Noah Misch <noah@leadboat.com>) |
Responses |
Re: race condition in pg_class
Re: race condition in pg_class |
List | pgsql-hackers |
On Wed, Jun 5, 2024 at 2:17 PM Noah Misch <noah@leadboat.com> wrote: > Starting 2024-06-10, I plan to push the first seven of the ten patches: > > inplace005-UNEXPECTEDPASS-tap-meson-v1.patch > inplace010-tests-v1.patch > inplace040-waitfuncs-v1.patch > inplace050-tests-inj-v1.patch > inplace060-nodeModifyTable-comments-v1.patch > Those five just deal in tests, test infrastructure, and comments. > inplace070-rel-locks-missing-v1.patch > Main risk is new DDL deadlocks. > inplace080-catcache-detoast-inplace-stale-v1.patch > If it fails to fix the bug it targets, I expect it's a no-op rather than > breaking things. > > I'll leave the last three of the ten needing review. Those three are beyond > my skill to self-certify. It's not this patch set's fault, but I'm not very pleased to see that the injection point wait events have been shoehorned into the "Extension" category - which they are not - instead of being a new wait_event_type. That would have avoided the ugly wait-event naming pattern, inconsistent with everything else, introduced by inplace050-tests-inj-v1.patch. I think that the comments and commit messages in this patch set could, in some places, use improvement. For instance, inplace060-nodeModifyTable-comments-v1.patch reflows a bunch of comments, which makes it hard to see what actually changed, and the commit message doesn't tell you, either. A good bit of it seems to be changing "a view" to "a view INSTEAD OF trigger" or "a view having an INSTEAD OF trigger," but the reasoning behind that change is not spelled out anywhere. The reader is left to guess what the other case is and why the same principles don't apply to it. I don't doubt that the new comments are more correct than the old ones, but I expect future patch authors to have difficulty maintaining that state of affairs. Similarly, inplace070-rel-locks-missing-v1.patch adds no comments. IMHO, the commit message also isn't very informative. It disclaims knowledge of what bug it's fixing, while at the same time leaving the reader to figure out for themselves how the behavior has changed. Consequently, I expect writing the release notes for a release including this patch to be difficult: "We added some locks that block ... something ... in some circumstances ... to prevent ... something." It's not really the job of the release note author to fill in those blanks, but rather of the patch author or committer. I don't want to overburden the act of fixing bugs, but I just feel like more explanation is needed here. When I see for example that we're adding a lock acquisition to the end of heap_create(), I can't help but wonder if it's really true that we don't take a lock on a just-created relation today. I'm certainly under the impression that we lock newly-created, uncommitted relations, and a quick test seems to confirm that. I don't quite know whether that happens, but evidently this call is guarding against something more subtle than a categorical failure to lock a relation on creation so I think there should be a comment explaining what that thing is. It's also quite surprising that SetRelationHasSubclass() says "take X lock before calling" and 2 of 4 callers just don't. I guess that's how it is. But shouldn't we then have an assertion inside that function to guard against future mistakes? If the reason why we failed to add this initially is discernible from the commit messages that introduced the bug, it would be nice to mention what it seems to have been; if not, it would at least be nice to mention the offending commit(s). I'm also a bit worried that this is going to cause deadlocks, but I suppose if it does, that's still better than the status quo. IsInplaceUpdateOid's header comment says IsInplaceUpdateRelation instead of IsInplaceUpdateOid. inplace080-catcache-detoast-inplace-stale-v1.patch seems like another place where spelling out the rationale in more detail would be helpful to future readers; for instance, the commit message says that PgDatabaseToastTable is the only one affected, but it doesn't say why the others are not, or why this one is. The lengthy comment in CatalogCacheCreateEntry is also difficult to correlate with the code which follows. I can't guess whether the two cases called out in the comment always needed to be handled and were handled save only for in-place updates, and thus the comment changes were simply taking the opportunity to elaborate on the existing comments; or whether one of those cases is preexisting and the other arises from the desire to handle inplace updates. It could be helpful to mention relevant identifiers from the code in the comment text e.g. "systable_recheck_tuple detects ordinary updates by noting changes to the tuple's visibility information, while the equalTuple() case detects inplace updates." IMHO, this patch set underscores the desirability of removing in-place update altogether. That sounds difficult and not back-patchable, but I can't classify what this patch set does as anything better than grotty hacks to work around serious design deficiencies. That is not a vote against these patches: I see no better way forward. Nonetheless, I dislike the lack of better options. I have done only cursory review of the last two patches and don't feel I'm in a place to certify them, at least not now. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: