Re: race condition in pg_class - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: race condition in pg_class |
Date | |
Msg-id | 20240611024525.9f.nmisch@google.com Whole thread Raw |
In response to | Re: race condition in pg_class (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: race condition in pg_class
|
List | pgsql-hackers |
On Thu, Jun 06, 2024 at 09:48:51AM -0400, Robert Haas wrote: > On Wed, Jun 5, 2024 at 2:17 PM Noah Misch <noah@leadboat.com> wrote: > > Starting 2024-06-10, I plan to push the first seven of the ten patches: > > > > inplace005-UNEXPECTEDPASS-tap-meson-v1.patch > > inplace010-tests-v1.patch > > inplace040-waitfuncs-v1.patch > > inplace050-tests-inj-v1.patch > > inplace060-nodeModifyTable-comments-v1.patch > > Those five just deal in tests, test infrastructure, and comments. > > inplace070-rel-locks-missing-v1.patch > > Main risk is new DDL deadlocks. > > inplace080-catcache-detoast-inplace-stale-v1.patch > > If it fails to fix the bug it targets, I expect it's a no-op rather than > > breaking things. > > > > I'll leave the last three of the ten needing review. Those three are beyond > > my skill to self-certify. > > It's not this patch set's fault, but I'm not very pleased to see that > the injection point wait events have been shoehorned into the > "Extension" category I've replied on that branch of the thread. > I think that the comments and commit messages in this patch set could, > in some places, use improvement. For instance, > inplace060-nodeModifyTable-comments-v1.patch reflows a bunch of > comments, which makes it hard to see what actually changed, and the > commit message doesn't tell you, either. A good bit of it seems to be > changing "a view" to "a view INSTEAD OF trigger" or "a view having an > INSTEAD OF trigger," but the reasoning behind that change is not > spelled out anywhere. The reader is left to guess what the other case > is and why the same principles don't apply to it. I don't doubt that > the new comments are more correct than the old ones, but I expect > future patch authors to have difficulty maintaining that state of > affairs. The two kinds are trigger-updatable views and auto-updatable views. I've added sentences about that to the nodeModifyTable.c header comment. One could argue for dropping the INSTEAD OF comment changes outside of the header. > Similarly, inplace070-rel-locks-missing-v1.patch adds no comments. > IMHO, the commit message also isn't very informative. It disclaims > knowledge of what bug it's fixing, while at the same time leaving the > reader to figure out for themselves how the behavior has changed. > Consequently, I expect writing the release notes for a release > including this patch to be difficult: "We added some locks that block > ... something ... in some circumstances ... to prevent ... something." > It's not really the job of the release note author to fill in those > blanks, but rather of the patch author or committer. I don't want to I had been thinking release notes should just say "Add missing DDL lock acquisitions". One can cure a breach of our locking standards without proving some specific bad outcome. However, one could counter that commands like GRANT follow a different standard, and perhaps SetRelationHasSubclass() should use the GRANT standard. Hence, I researched the bugs this fixes and split inplace070-rel-locks-missing into three patches: 1. [inplace065-lock-SequenceChangePersistence] Lock in SequenceChangePersistence(), where the omission can lose nextval() increments of the sequence. 2. [inplace071-lock-SetRelationHasSubclass] Lock in SetRelationHasSubclass(). This one has only minor benefits; see the new commit message. A fair alternative would be tuple-level locking in inplace120-locktag, like that patch adds to GRANT. That might avoid some deadlocks. I feel like the minor benefits justify the way I chose, but it's a weak preference. 3. [inplace075-lock-heap_create] Add to heap creation: > overburden the act of fixing bugs, but I just feel like more > explanation is needed here. When I see for example that we're adding a > lock acquisition to the end of heap_create(), I can't help but wonder > if it's really true that we don't take a lock on a just-created > relation today. I'm certainly under the impression that we lock > newly-created, uncommitted relations, and a quick test seems to > confirm that. I don't quite know whether that happens, but evidently > this call is guarding against something more subtle than a categorical > failure to lock a relation on creation so I think there should be a > comment explaining what that thing is. I've covered that in the new log message. To lock as early as possible, I've moved this up a layer, to just after relid assignment. One could argue this change belongs in inplace120 rather than its own patch, since it's only here to eliminate a harmless exception to the rule inplace120 asserts. I've removed the update_relispartition() that appeared in inplace070-rel-locks-missing-v1.patch. Only an older, unpublished draft of the rules (that inplace110-successors adds to README.tuplock) required that lock. The lock might be worthwhile for avoiding "tuple concurrently updated", but it's out of scope for $SUBJECT. > It's also quite surprising that SetRelationHasSubclass() says "take X > lock before calling" and 2 of 4 callers just don't. I guess that's how > it is. But shouldn't we then have an assertion inside that function to > guard against future mistakes? If the reason why we failed to add this Works for me. Done. I've moved the LockHeldByMe() change from inplace110-successors to this patch, since the assertion wants it. > initially is discernible from the commit messages that introduced the > bug, it would be nice to mention what it seems to have been; if not, > it would at least be nice to mention the offending commit(s). I'm also Done. > a bit worried that this is going to cause deadlocks, but I suppose if > it does, that's still better than the status quo. > > IsInplaceUpdateOid's header comment says IsInplaceUpdateRelation > instead of IsInplaceUpdateOid. Fixed. > inplace080-catcache-detoast-inplace-stale-v1.patch seems like another > place where spelling out the rationale in more detail would be helpful > to future readers; for instance, the commit message says that > PgDatabaseToastTable is the only one affected, but it doesn't say why > the others are not, or why this one is. The lengthy comment in I've updated the commit message to answer that. > CatalogCacheCreateEntry is also difficult to correlate with the code > which follows. I can't guess whether the two cases called out in the > comment always needed to be handled and were handled save only for > in-place updates, and thus the comment changes were simply taking the > opportunity to elaborate on the existing comments; or whether one of > those cases is preexisting and the other arises from the desire to > handle inplace updates. It could be helpful to mention relevant > identifiers from the code in the comment text e.g. > "systable_recheck_tuple detects ordinary updates by noting changes to > the tuple's visibility information, while the equalTuple() case > detects inplace updates." The patch was elaborating on existing comments. Reading the patch again today, the elaboration no longer feels warranted. Hence, I've rewritten that comment addition. I've included identifiers, and the patch no longer adds comment material orthogonal to inplace updates. Thanks, nm
Attachment
- inplace005-UNEXPECTEDPASS-tap-meson-v2.patch
- inplace010-tests-v2.patch
- inplace040-waitfuncs-v2.patch
- inplace050-tests-inj-v2.patch
- inplace060-nodeModifyTable-comments-v2.patch
- inplace065-lock-SequenceChangePersistence-v2.patch
- inplace071-lock-SetRelationHasSubclass-v2.patch
- inplace075-lock-heap_create-v2.patch
- inplace080-catcache-detoast-inplace-stale-v2.patch
- inplace090-LOCKTAG_TUPLE-eoxact-v2.patch
- inplace110-successors-v2.patch
- inplace120-locktag-v2.patch
pgsql-hackers by date: