Re: race condition in pg_class - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: race condition in pg_class |
Date | |
Msg-id | 20240903212329.61.nmisch@google.com Whole thread Raw |
In response to | Re: race condition in pg_class (Nitin Motiani <nitinmotiani@google.com>) |
Responses |
Re: race condition in pg_class
|
List | pgsql-hackers |
On Tue, Sep 03, 2024 at 09:24:52PM +0530, Nitin Motiani wrote: > On Sat, Aug 31, 2024 at 6:40 AM Noah Misch <noah@leadboat.com> wrote: > > On Thu, Aug 29, 2024 at 09:08:43PM +0530, Nitin Motiani wrote: > > > On Thu, Aug 29, 2024 at 8:11 PM Noah Misch <noah@leadboat.com> wrote: > > > > - In the cancel case, call both systable_inplace_update_cancel and > > > > systable_inplace_update_end. _finish or _cancel would own unlock, while > > > > _end would own systable_endscan(). > > > > > > > What happens to CacheInvalidateHeapTuple() in this approach? I think > > > it will still need to be brought to the genam.c layer if we are > > > releasing the lock in systable_inplace_update_finish. > understanding is that the code in this approach would look like below > : > if (dirty) > systable_inplace_update_finish(inplace_state, tup); > else > systable_inplace_update_cancel(inplace_state); > systable_inplace_update_end(inplace_state); > > And that in this structure, both _finish and _cancel will call > heap_inplace_unlock and then _end will call systable_endscan. So even > with this structure, the invalidation has to happen inside _finish > after the unlock. Right. > So this also pulls the invalidation to the genam.c > layer. Am I understanding this correctly? Compared to the v9 patch, the "call both" alternative would just move the systable_endscan() call to a new systable_inplace_update_end(). It wouldn't move anything across the genam:heapam boundary. systable_inplace_update_finish() would remain a thin wrapper around a heapam function. > > > > - Hoist the CacheInvalidateHeapTuple() up to the genam.c layer. While > > > > tolerable now, this gets less attractive after the inplace160 patch from > > > > https://postgr.es/m/flat/20240523000548.58.nmisch@google.com > > > > > > > I skimmed through the inplace160 patch. It wasn't clear to me why this > > > becomes less attractive with the patch. I see there is a new > > > CacheInvalidateHeapTupleInPlace but that looks like it would be called > > > while holding the lock. And then there is an > > > AcceptInvalidationMessages which can perhaps be moved to the genam.c > > > layer too. Is the concern that one invalidation call will be in the > > > heapam layer and the other will be in the genam layer? > > > > That, or a critical section would start in heapam.c, then end in genam.c. > > Current call tree at inplace160 v4: > How about this alternative? > > genam.c:systable_inplace_update_finish > PreInplace_Inval > START_CRIT_SECTION > heapam.c:heap_inplace_update > BUFFER_LOCK_UNLOCK > AtInplace_Inval > END_CRIT_SECTION > UnlockTuple > AcceptInvalidationMessages > Looking at inplace160, it seems that the start of the critical section > is right after PreInplace_Inval. So why not pull START_CRIT_SECTION > and END_CRIT_SECTION out to the genam.c layer? heap_inplace_update() has an elog(ERROR) that needs to happen outside any critical section. Since the condition for that elog deals with tuple header internals, it belongs at the heapam layer more than the systable layer. > Alternatively since > heap_inplace_update is commented as a subroutine of > systable_inplace_update_finish, should everything just be moved to the > genam.c layer? Although it looks like you already considered and > rejected this approach. Calling XLogInsert(RM_HEAP_ID) in genam.c would be a worse modularity violation than the one that led to the changes between v8 and v9. I think even calling CacheInvalidateHeapTuple() in genam.c would be a worse modularity violation than the one attributed to v8. Modularity would have the heap_inplace function resemble heap_update() handling of invals. > If the above alternatives are not possible, it's probably fine to go > ahead with the current patch with the function renamed to > heap_inplace_update_and_unlock (or something similar) as mentioned > earlier? I like that name. The next version will use it. > > > I see that all the places we check resultRelInfo->ri_needLockTagTuple, > > > we can just call > > > IsInplaceUpdateRelation(resultRelInfo->ri_RelationDesc). Is there a > > > big advantage of storing a separate bool field? Also there is another > > > > No, not a big advantage. I felt it was more in line with the typical style of > > src/backend/executor. > just find it a smell that there are two fields which are not > independent and they go out of sync. And that's why my preference is > to not have a dependent field unless there is a specific advantage. Got it. This check happens for every tuple of every UPDATE, so performance may be a factor. Some designs and their merits: ==== a. ri_needLockTagTuple Performance: best: check one value for nonzero Drawback: one more value lifecycle to understand Drawback: users of ResultRelInfo w/o InitResultRelInfo() could miss this ==== b. call IsInplaceUpdateRelation Performance: worst: two extern function calls, then compare against two values ==== c. make IsInplaceUpdateRelation() and IsInplaceUpdateOid() inline, and call Performance: high: compare against two values Drawback: unlike catalog.c peers Drawback: extensions that call these must recompile if these change ==== d. add IsInplaceUpdateRelationInline() and IsInplaceUpdateOidInline(), and call Performance: high: compare against two values Drawback: more symbols to understand Drawback: extensions might call these, reaching the drawback of (c) I think my preference order is (a), (c), (d), (b). How do you see it? > But even for the > ri_TrigDesc, CatalogOpenIndexes() still sets it to NULL. So shouldn't > ri_needLockTagTuple also be set to a default value of false? CatalogOpenIndexes() explicitly zero-initializes two fields and relies on makeNode() zeroing for dozens of others. Hence, omitting the initialization fits the function's local convention better than including it. (PostgreSQL has no policy or dominant practice about redundant zero-initialization.)
pgsql-hackers by date: