Re: Concurrent deadlock scenario with DROP INDEX on partitioned index - Mailing list pgsql-hackers

From Zhihong Yu
Subject Re: Concurrent deadlock scenario with DROP INDEX on partitioned index
Date
Msg-id CALNJ-vT_eRWsejyAMfD6h5LPKQTHpPzJLfNxtpczePookpFK2w@mail.gmail.com
Whole thread Raw
In response to Re: Concurrent deadlock scenario with DROP INDEX on partitioned index  (Jimmy Yih <jyih@vmware.com>)
Responses Re: Concurrent deadlock scenario with DROP INDEX on partitioned index
List pgsql-hackers


On Wed, Mar 16, 2022 at 11:20 AM Jimmy Yih <jyih@vmware.com> wrote:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 1. RangeVarCallbackForDropRelation can get called more than once
> during a lookup (in case of concurrent rename and suchlike).
> If so it needs to be prepared to drop the lock(s) it got last time.
> You have not implemented any such logic.  This doesn't seem hard
> to fix, just store the OID list into struct DropRelationCallbackState.

Fixed in attached patch. We added the OID list to the
DropRelationCallbackState as you suggested.

> 2. I'm not sure you have fully thought through the interaction
> with the subsequent "is_partition" stanza.   If the target is
> an intermediate partitioned index, don't we need to acquire lock
> on its parent index before starting to lock children?  (It may
> be that that stanza is already in the wrong place relative to
> the table-locking stanza it currently follows, but not sure.)

It's not required to acquire lock on a possible parent index because
of the restriction where we can only run DROP INDEX on the top-most
partitioned index.

> 3. Calling find_all_inheritors with lockmode NoLock, and then
> locking those relation OIDs later, is both unsafe and code-wasteful.
> Just tell find_all_inheritors to get the locks you want.

Fixed in attached patch.

> 4. This code will invoke find_all_inheritors on InvalidOid if
> IndexGetRelation fails.  It needs to be within the if-test
> just above.

Fixed in attached patch.

> 5. Reading classform again at this point in the function is
> not merely inefficient, but outright wrong, because we
> already released the syscache entry.  Use the local variable.

Fixed in attached patch. Added another local variable
is_partitioned_index to store the classform value. The main reason we
need the classform is because the existing relkind and
expected_relkind local variables would only show RELKIND_INDEX whereas
we needed exactly RELKIND_PARTITIONED_INDEX.

> 6. You've not paid enough attention to updating existing comments,
> particularly the header comment for RangeVarCallbackForDropRelation.

Fixed in attached patch. Updated the header comment and aggregated our
introduced comment to another relative comment slightly above the
proposed locking section.

> Actually though, maybe you *don't* want to do this in
> RangeVarCallbackForDropRelation.  Because of point 2, it might be
> better to run find_all_inheritors after we've successfully
> identified and locked the direct target relation, ie do it back
> in RemoveRelations.  I've not thought hard about that, but it's
> attractive if only because it'd mean you don't have to fix point 1.

We think that RangeVarCallbackForDropRelation is probably the most
correct function to place the fix. It would look a bit out-of-place
being in RemoveRelations seeing how there's already relative DROP
INDEX code in RangeVarCallbackForDropRelation. With point 2 explained
and point 1 addressed, the fix seems to look okay in
RangeVarCallbackForDropRelation.

Thanks for the feedback.  Attached new patch with feedback addressed.

--
Jimmy Yih (VMware)
Gaurab Dey (VMware)

Hi,
For RangeVarCallbackForDropRelation():

+               LockRelationOid(indexRelationOid, heap_lockmode); 

Since the above is called for both if and else blocks, it can be lifted outside.

Cheers

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Proposal: Support custom authentication methods using hooks
Next
From: Tom Lane
Date:
Subject: Re: Granting SET and ALTER SYSTE privileges for GUCs