Re: Concurrent deadlock scenario with DROP INDEX on partitioned index - Mailing list pgsql-hackers
From | Jimmy Yih |
---|---|
Subject | Re: Concurrent deadlock scenario with DROP INDEX on partitioned index |
Date | |
Msg-id | BYAPR05MB6454BA1DC1F25BD4996F4D67BD119@BYAPR05MB6454.namprd05.prod.outlook.com Whole thread Raw |
In response to | Re: Concurrent deadlock scenario with DROP INDEX on partitioned index (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Concurrent deadlock scenario with DROP INDEX on partitioned index
(Zhihong Yu <zyu@yugabyte.com>)
Re: Concurrent deadlock scenario with DROP INDEX on partitioned index (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
Tom Lane <tgl@sss.pgh.pa.us> wrote: > 1. RangeVarCallbackForDropRelation can get called more than once > during a lookup (in case of concurrent rename and suchlike). > If so it needs to be prepared to drop the lock(s) it got last time. > You have not implemented any such logic. This doesn't seem hard > to fix, just store the OID list into struct DropRelationCallbackState. Fixed in attached patch. We added the OID list to the DropRelationCallbackState as you suggested. > 2. I'm not sure you have fully thought through the interaction > with the subsequent "is_partition" stanza. If the target is > an intermediate partitioned index, don't we need to acquire lock > on its parent index before starting to lock children? (It may > be that that stanza is already in the wrong place relative to > the table-locking stanza it currently follows, but not sure.) It's not required to acquire lock on a possible parent index because of the restriction where we can only run DROP INDEX on the top-most partitioned index. > 3. Calling find_all_inheritors with lockmode NoLock, and then > locking those relation OIDs later, is both unsafe and code-wasteful. > Just tell find_all_inheritors to get the locks you want. Fixed in attached patch. > 4. This code will invoke find_all_inheritors on InvalidOid if > IndexGetRelation fails. It needs to be within the if-test > just above. Fixed in attached patch. > 5. Reading classform again at this point in the function is > not merely inefficient, but outright wrong, because we > already released the syscache entry. Use the local variable. Fixed in attached patch. Added another local variable is_partitioned_index to store the classform value. The main reason we need the classform is because the existing relkind and expected_relkind local variables would only show RELKIND_INDEX whereas we needed exactly RELKIND_PARTITIONED_INDEX. > 6. You've not paid enough attention to updating existing comments, > particularly the header comment for RangeVarCallbackForDropRelation. Fixed in attached patch. Updated the header comment and aggregated our introduced comment to another relative comment slightly above the proposed locking section. > Actually though, maybe you *don't* want to do this in > RangeVarCallbackForDropRelation. Because of point 2, it might be > better to run find_all_inheritors after we've successfully > identified and locked the direct target relation, ie do it back > in RemoveRelations. I've not thought hard about that, but it's > attractive if only because it'd mean you don't have to fix point 1. We think that RangeVarCallbackForDropRelation is probably the most correct function to place the fix. It would look a bit out-of-place being in RemoveRelations seeing how there's already relative DROP INDEX code in RangeVarCallbackForDropRelation. With point 2 explained and point 1 addressed, the fix seems to look okay in RangeVarCallbackForDropRelation. Thanks for the feedback. Attached new patch with feedback addressed. -- Jimmy Yih (VMware) Gaurab Dey (VMware)
Attachment
pgsql-hackers by date: