Thread: Concurrent deadlock scenario with DROP INDEX on partitioned index

Concurrent deadlock scenario with DROP INDEX on partitioned index

From
Jimmy Yih
Date:
Hello pgsql-hackers,

When dropping a partitioned index, the locking order starts with the
partitioned table, then its partitioned index, then the partition
indexes dependent on that partitioned index, and finally the dependent
partition indexes' parent tables. This order allows a deadlock
scenario to happen if for example an ANALYZE happens on one of the
partition tables which locks the partition table and then blocks when
it attempts to lock its index (the DROP INDEX has the index lock but
cannot get the partition table lock).

Deadlock Reproduction
==================================================
Initial setup:
CREATE TABLE foopart (a int) PARTITION BY RANGE (a);
CREATE TABLE foopart_child PARTITION OF foopart FOR VALUES FROM (1) TO (5);
CREATE INDEX foopart_idx ON foopart USING btree(a);

Attach debugger to session 1 and put breakpoint on function deleteObjectsInList:
1: DROP INDEX foopart_idx;

While session 1 is blocked by debugger breakpoint, run the following:
2: ANALYZE foopart_child;

After a couple seconds, detach the debugger from session 1 and see deadlock.

In order to prevent this deadlock scenario, we should find and lock
all the sub-partition and partition tables beneath the partitioned
index's partitioned table before attempting to lock the sub-partition
and partition indexes.

Attached is a patch (+isolation test) which fixes the issue. We
observed this on latest head of Postgres.

Regards,
Jimmy Yih and Gaurab Dey
Attachment

Re: Concurrent deadlock scenario with DROP INDEX on partitioned index

From
Tom Lane
Date:
Jimmy Yih <jyih@vmware.com> writes:
> When dropping a partitioned index, the locking order starts with the
> partitioned table, then its partitioned index, then the partition
> indexes dependent on that partitioned index, and finally the dependent
> partition indexes' parent tables. This order allows a deadlock
> scenario to happen if for example an ANALYZE happens on one of the
> partition tables which locks the partition table and then blocks when
> it attempts to lock its index (the DROP INDEX has the index lock but
> cannot get the partition table lock).

I agree this is a bug, and I think that you may have the right
general idea about how to fix it: acquire the necessary locks in
RangeVarCallbackForDropRelation.  However, this patch still needs
a fair amount of work.

1. RangeVarCallbackForDropRelation can get called more than once
during a lookup (in case of concurrent rename and suchlike).
If so it needs to be prepared to drop the lock(s) it got last time.
You have not implemented any such logic.  This doesn't seem hard
to fix, just store the OID list into struct DropRelationCallbackState.

2. I'm not sure you have fully thought through the interaction
with the subsequent "is_partition" stanza.   If the target is
an intermediate partitioned index, don't we need to acquire lock
on its parent index before starting to lock children?  (It may
be that that stanza is already in the wrong place relative to
the table-locking stanza it currently follows, but not sure.)

3. Calling find_all_inheritors with lockmode NoLock, and then
locking those relation OIDs later, is both unsafe and code-wasteful.
Just tell find_all_inheritors to get the locks you want.

4. This code will invoke find_all_inheritors on InvalidOid if
IndexGetRelation fails.  It needs to be within the if-test
just above.

5. Reading classform again at this point in the function is
not merely inefficient, but outright wrong, because we
already released the syscache entry.  Use the local variable.

6. You've not paid enough attention to updating existing comments,
particularly the header comment for RangeVarCallbackForDropRelation.

Actually though, maybe you *don't* want to do this in
RangeVarCallbackForDropRelation.  Because of point 2, it might be
better to run find_all_inheritors after we've successfully
identified and locked the direct target relation, ie do it back
in RemoveRelations.  I've not thought hard about that, but it's
attractive if only because it'd mean you don't have to fix point 1.

            regards, tom lane



Re: Concurrent deadlock scenario with DROP INDEX on partitioned index

From
Jimmy Yih
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 1. RangeVarCallbackForDropRelation can get called more than once
> during a lookup (in case of concurrent rename and suchlike).
> If so it needs to be prepared to drop the lock(s) it got last time.
> You have not implemented any such logic.  This doesn't seem hard
> to fix, just store the OID list into struct DropRelationCallbackState.

Fixed in attached patch. We added the OID list to the
DropRelationCallbackState as you suggested.

> 2. I'm not sure you have fully thought through the interaction
> with the subsequent "is_partition" stanza.   If the target is
> an intermediate partitioned index, don't we need to acquire lock
> on its parent index before starting to lock children?  (It may
> be that that stanza is already in the wrong place relative to
> the table-locking stanza it currently follows, but not sure.)

It's not required to acquire lock on a possible parent index because
of the restriction where we can only run DROP INDEX on the top-most
partitioned index.

> 3. Calling find_all_inheritors with lockmode NoLock, and then
> locking those relation OIDs later, is both unsafe and code-wasteful.
> Just tell find_all_inheritors to get the locks you want.

Fixed in attached patch.

> 4. This code will invoke find_all_inheritors on InvalidOid if
> IndexGetRelation fails.  It needs to be within the if-test
> just above.

Fixed in attached patch.

> 5. Reading classform again at this point in the function is
> not merely inefficient, but outright wrong, because we
> already released the syscache entry.  Use the local variable.

Fixed in attached patch. Added another local variable
is_partitioned_index to store the classform value. The main reason we
need the classform is because the existing relkind and
expected_relkind local variables would only show RELKIND_INDEX whereas
we needed exactly RELKIND_PARTITIONED_INDEX.

> 6. You've not paid enough attention to updating existing comments,
> particularly the header comment for RangeVarCallbackForDropRelation.

Fixed in attached patch. Updated the header comment and aggregated our
introduced comment to another relative comment slightly above the
proposed locking section.

> Actually though, maybe you *don't* want to do this in
> RangeVarCallbackForDropRelation.  Because of point 2, it might be
> better to run find_all_inheritors after we've successfully
> identified and locked the direct target relation, ie do it back
> in RemoveRelations.  I've not thought hard about that, but it's
> attractive if only because it'd mean you don't have to fix point 1.

We think that RangeVarCallbackForDropRelation is probably the most
correct function to place the fix. It would look a bit out-of-place
being in RemoveRelations seeing how there's already relative DROP
INDEX code in RangeVarCallbackForDropRelation. With point 2 explained
and point 1 addressed, the fix seems to look okay in
RangeVarCallbackForDropRelation.

Thanks for the feedback.  Attached new patch with feedback addressed.

--
Jimmy Yih (VMware)
Gaurab Dey (VMware)
Attachment

Re: Concurrent deadlock scenario with DROP INDEX on partitioned index

From
Zhihong Yu
Date:


On Wed, Mar 16, 2022 at 11:20 AM Jimmy Yih <jyih@vmware.com> wrote:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 1. RangeVarCallbackForDropRelation can get called more than once
> during a lookup (in case of concurrent rename and suchlike).
> If so it needs to be prepared to drop the lock(s) it got last time.
> You have not implemented any such logic.  This doesn't seem hard
> to fix, just store the OID list into struct DropRelationCallbackState.

Fixed in attached patch. We added the OID list to the
DropRelationCallbackState as you suggested.

> 2. I'm not sure you have fully thought through the interaction
> with the subsequent "is_partition" stanza.   If the target is
> an intermediate partitioned index, don't we need to acquire lock
> on its parent index before starting to lock children?  (It may
> be that that stanza is already in the wrong place relative to
> the table-locking stanza it currently follows, but not sure.)

It's not required to acquire lock on a possible parent index because
of the restriction where we can only run DROP INDEX on the top-most
partitioned index.

> 3. Calling find_all_inheritors with lockmode NoLock, and then
> locking those relation OIDs later, is both unsafe and code-wasteful.
> Just tell find_all_inheritors to get the locks you want.

Fixed in attached patch.

> 4. This code will invoke find_all_inheritors on InvalidOid if
> IndexGetRelation fails.  It needs to be within the if-test
> just above.

Fixed in attached patch.

> 5. Reading classform again at this point in the function is
> not merely inefficient, but outright wrong, because we
> already released the syscache entry.  Use the local variable.

Fixed in attached patch. Added another local variable
is_partitioned_index to store the classform value. The main reason we
need the classform is because the existing relkind and
expected_relkind local variables would only show RELKIND_INDEX whereas
we needed exactly RELKIND_PARTITIONED_INDEX.

> 6. You've not paid enough attention to updating existing comments,
> particularly the header comment for RangeVarCallbackForDropRelation.

Fixed in attached patch. Updated the header comment and aggregated our
introduced comment to another relative comment slightly above the
proposed locking section.

> Actually though, maybe you *don't* want to do this in
> RangeVarCallbackForDropRelation.  Because of point 2, it might be
> better to run find_all_inheritors after we've successfully
> identified and locked the direct target relation, ie do it back
> in RemoveRelations.  I've not thought hard about that, but it's
> attractive if only because it'd mean you don't have to fix point 1.

We think that RangeVarCallbackForDropRelation is probably the most
correct function to place the fix. It would look a bit out-of-place
being in RemoveRelations seeing how there's already relative DROP
INDEX code in RangeVarCallbackForDropRelation. With point 2 explained
and point 1 addressed, the fix seems to look okay in
RangeVarCallbackForDropRelation.

Thanks for the feedback.  Attached new patch with feedback addressed.

--
Jimmy Yih (VMware)
Gaurab Dey (VMware)

Hi,
For RangeVarCallbackForDropRelation():

+               LockRelationOid(indexRelationOid, heap_lockmode); 

Since the above is called for both if and else blocks, it can be lifted outside.

Cheers

Re: Concurrent deadlock scenario with DROP INDEX on partitioned index

From
Jimmy Yih
Date:
Zhihong Yu <zyu@yugabyte.com> wrote:
> Hi,
> For RangeVarCallbackForDropRelation():
>
> +               LockRelationOid(indexRelationOid, heap_lockmode);
>
> Since the above is called for both if and else blocks, it can be lifted outside.

Thanks for the feedback.  Attached new v3 patch with feedback addressed.

--
Jimmy Yih (VMware)
Gaurab Dey (VMware)
Attachment

Re: Concurrent deadlock scenario with DROP INDEX on partitioned index

From
Tom Lane
Date:
Jimmy Yih <jyih@vmware.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Actually though, maybe you *don't* want to do this in
>> RangeVarCallbackForDropRelation.  Because of point 2, it might be
>> better to run find_all_inheritors after we've successfully
>> identified and locked the direct target relation, ie do it back
>> in RemoveRelations.  I've not thought hard about that, but it's
>> attractive if only because it'd mean you don't have to fix point 1.

> We think that RangeVarCallbackForDropRelation is probably the most
> correct function to place the fix. It would look a bit out-of-place
> being in RemoveRelations seeing how there's already relative DROP
> INDEX code in RangeVarCallbackForDropRelation.

I really think you made the wrong choice here.  Doing the locking in
RemoveRelations leads to an extremely simple patch, as I demonstrate
below.  Moreover, since RemoveRelations also has special-case code
for partitioned indexes, it's hard to argue that it mustn't cover
this case too.

Also, I think the proposed test case isn't very good, because when
I run it without applying the code patch, it fails to demonstrate
any deadlock.  The query output is different, but not obviously
wrong.

> Fixed in attached patch. Added another local variable
> is_partitioned_index to store the classform value. The main reason we
> need the classform is because the existing relkind and
> expected_relkind local variables would only show RELKIND_INDEX whereas
> we needed exactly RELKIND_PARTITIONED_INDEX.

Yeah.  As I looked at that I realized that it was totally confusing:
at least one previous author thought that "relkind" stored the rel's
actual relkind, which it doesn't as the code stands.  In particular,
in this bit:

    if ((relkind == RELKIND_INDEX || relkind == RELKIND_PARTITIONED_INDEX) &&
        relOid != oldRelOid)
    {
        state->heapOid = IndexGetRelation(relOid, true);

the test for relkind == RELKIND_PARTITIONED_INDEX is dead code
because relkind will never be that.  It accidentally works anyway
because the other half of the || does the right thing, but somebody
was confused, and so will readers be.

Hence, I propose the attached.  0001 is pure refactoring: it hopefully
clears up the confusion about which "relkind" is which, and it also
saves a couple of redundant syscache fetches in RemoveRelations.
Then 0002 adds the actual bug fix as well as a test case that does
deadlock with unpatched code.

            regards, tom lane

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index dc5872f988..26ccd7f481 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -295,12 +295,18 @@ static const struct dropmsgstrings dropmsgstringarray[] = {
     {'\0', 0, NULL, NULL, NULL, NULL}
 };

+/* communication between RemoveRelations and RangeVarCallbackForDropRelation */
 struct DropRelationCallbackState
 {
-    char        relkind;
+    /* These fields are set by RemoveRelations: */
+    char        expected_relkind;
+    bool        concurrent;
+    /* These fields are state to track which subsidiary locks are held: */
     Oid            heapOid;
     Oid            partParentOid;
-    bool        concurrent;
+    /* These fields are passed back by RangeVarCallbackForDropRelation: */
+    char        actual_relkind;
+    char        actual_relpersistence;
 };

 /* Alter table target-type flags for ATSimplePermissions */
@@ -1416,10 +1422,12 @@ RemoveRelations(DropStmt *drop)
         AcceptInvalidationMessages();

         /* Look up the appropriate relation using namespace search. */
-        state.relkind = relkind;
+        state.expected_relkind = relkind;
+        state.concurrent = drop->concurrent;
+        /* We must initialize these fields to show that no locks are held: */
         state.heapOid = InvalidOid;
         state.partParentOid = InvalidOid;
-        state.concurrent = drop->concurrent;
+
         relOid = RangeVarGetRelidExtended(rel, lockmode, RVR_MISSING_OK,
                                           RangeVarCallbackForDropRelation,
                                           (void *) &state);
@@ -1433,10 +1441,10 @@ RemoveRelations(DropStmt *drop)

         /*
          * Decide if concurrent mode needs to be used here or not.  The
-         * relation persistence cannot be known without its OID.
+         * callback retrieved the rel's persistence for us.
          */
         if (drop->concurrent &&
-            get_rel_persistence(relOid) != RELPERSISTENCE_TEMP)
+            state.actual_relpersistence != RELPERSISTENCE_TEMP)
         {
             Assert(list_length(drop->objects) == 1 &&
                    drop->removeType == OBJECT_INDEX);
@@ -1448,7 +1456,7 @@ RemoveRelations(DropStmt *drop)
          * either.
          */
         if ((flags & PERFORM_DELETION_CONCURRENTLY) != 0 &&
-            get_rel_relkind(relOid) == RELKIND_PARTITIONED_INDEX)
+            state.actual_relkind == RELKIND_PARTITIONED_INDEX)
             ereport(ERROR,
                     (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                      errmsg("cannot drop partitioned index \"%s\" concurrently",
@@ -1479,7 +1487,6 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
 {
     HeapTuple    tuple;
     struct DropRelationCallbackState *state;
-    char        relkind;
     char        expected_relkind;
     bool        is_partition;
     Form_pg_class classform;
@@ -1487,7 +1494,6 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
     bool        invalid_system_index = false;

     state = (struct DropRelationCallbackState *) arg;
-    relkind = state->relkind;
     heap_lockmode = state->concurrent ?
         ShareUpdateExclusiveLock : AccessExclusiveLock;

@@ -1523,6 +1529,10 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
     classform = (Form_pg_class) GETSTRUCT(tuple);
     is_partition = classform->relispartition;

+    /* Pass back some data to save lookups in RemoveRelations */
+    state->actual_relkind = classform->relkind;
+    state->actual_relpersistence = classform->relpersistence;
+
     /*
      * Both RELKIND_RELATION and RELKIND_PARTITIONED_TABLE are OBJECT_TABLE,
      * but RemoveRelations() can only pass one relkind for a given relation.
@@ -1538,13 +1548,15 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
     else
         expected_relkind = classform->relkind;

-    if (relkind != expected_relkind)
-        DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+    if (state->expected_relkind != expected_relkind)
+        DropErrorMsgWrongType(rel->relname, classform->relkind,
+                              state->expected_relkind);

     /* Allow DROP to either table owner or schema owner */
     if (!pg_class_ownercheck(relOid, GetUserId()) &&
         !pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
-        aclcheck_error(ACLCHECK_NOT_OWNER, get_relkind_objtype(get_rel_relkind(relOid)),
+        aclcheck_error(ACLCHECK_NOT_OWNER,
+                       get_relkind_objtype(classform->relkind),
                        rel->relname);

     /*
@@ -1553,7 +1565,7 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
      * only concerns indexes of toast relations that became invalid during a
      * REINDEX CONCURRENTLY process.
      */
-    if (IsSystemClass(relOid, classform) && relkind == RELKIND_INDEX)
+    if (IsSystemClass(relOid, classform) && classform->relkind == RELKIND_INDEX)
     {
         HeapTuple    locTuple;
         Form_pg_index indexform;
@@ -1589,9 +1601,10 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
      * locking the index.  index_drop() will need this anyway, and since
      * regular queries lock tables before their indexes, we risk deadlock if
      * we do it the other way around.  No error if we don't find a pg_index
-     * entry, though --- the relation may have been dropped.
+     * entry, though --- the relation may have been dropped.  Note that this
+     * code will execute for either plain or partitioned indexes.
      */
-    if ((relkind == RELKIND_INDEX || relkind == RELKIND_PARTITIONED_INDEX) &&
+    if (expected_relkind == RELKIND_INDEX &&
         relOid != oldRelOid)
     {
         state->heapOid = IndexGetRelation(relOid, true);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 26ccd7f481..d8faf41c94 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1462,6 +1462,18 @@ RemoveRelations(DropStmt *drop)
                      errmsg("cannot drop partitioned index \"%s\" concurrently",
                             rel->relname)));

+        /*
+         * If we're told to drop a partitioned index, we must acquire lock on
+         * all the children of its parent partitioned table before proceeding.
+         * Otherwise we'd try to lock the child index partitions before their
+         * tables, leading to potential deadlock against other sessions that
+         * will lock those objects in the other order.
+         */
+        if (state.actual_relkind == RELKIND_PARTITIONED_INDEX)
+            (void) find_all_inheritors(state.heapOid,
+                                       AccessExclusiveLock,
+                                       NULL);
+
         /* OK, we're ready to delete this one */
         obj.classId = RelationRelationId;
         obj.objectId = relOid;
diff --git a/src/test/isolation/expected/partition-drop-index-locking.out
b/src/test/isolation/expected/partition-drop-index-locking.out
new file mode 100644
index 0000000000..9acd51dfde
--- /dev/null
+++ b/src/test/isolation/expected/partition-drop-index-locking.out
@@ -0,0 +1,100 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s1begin s1lock s2begin s2drop s1select s3getlocks s1commit s3getlocks s2commit
+step s1begin: BEGIN;
+step s1lock: LOCK TABLE part_drop_index_locking_subpart_child IN ACCESS SHARE MODE;
+step s2begin: BEGIN;
+step s2drop: DROP INDEX part_drop_index_locking_idx; <waiting ...>
+step s1select: SELECT * FROM part_drop_index_locking_subpart_child;
+id
+--
+(0 rows)
+
+step s3getlocks:
+        SELECT s.query, c.relname, l.mode, l.granted
+        FROM pg_locks l
+                JOIN pg_class c ON l.relation = c.oid
+                JOIN pg_stat_activity s ON l.pid = s.pid
+        WHERE c.relname LIKE 'part_drop_index_locking%'
+        ORDER BY s.query, c.relname, l.mode, l.granted;
+
+query                                               |relname                                      |mode
|granted

+----------------------------------------------------+---------------------------------------------+-------------------+-------
+DROP INDEX part_drop_index_locking_idx;             |part_drop_index_locking
|AccessExclusiveLock|t      
+DROP INDEX part_drop_index_locking_idx;             |part_drop_index_locking_idx
|AccessExclusiveLock|t      
+DROP INDEX part_drop_index_locking_idx;             |part_drop_index_locking_subpart
|AccessExclusiveLock|t      
+DROP INDEX part_drop_index_locking_idx;             |part_drop_index_locking_subpart_child
|AccessExclusiveLock|f      
+SELECT * FROM part_drop_index_locking_subpart_child;|part_drop_index_locking_subpart_child        |AccessShareLock
|t      
+SELECT * FROM part_drop_index_locking_subpart_child;|part_drop_index_locking_subpart_child_id_idx |AccessShareLock
|t      
+SELECT * FROM part_drop_index_locking_subpart_child;|part_drop_index_locking_subpart_child_id_idx1|AccessShareLock
|t      
+(7 rows)
+
+step s1commit: COMMIT;
+step s2drop: <... completed>
+step s3getlocks:
+        SELECT s.query, c.relname, l.mode, l.granted
+        FROM pg_locks l
+                JOIN pg_class c ON l.relation = c.oid
+                JOIN pg_stat_activity s ON l.pid = s.pid
+        WHERE c.relname LIKE 'part_drop_index_locking%'
+        ORDER BY s.query, c.relname, l.mode, l.granted;
+
+query                                  |relname                                     |mode               |granted
+---------------------------------------+--------------------------------------------+-------------------+-------
+DROP INDEX part_drop_index_locking_idx;|part_drop_index_locking                     |AccessExclusiveLock|t
+DROP INDEX part_drop_index_locking_idx;|part_drop_index_locking_idx                 |AccessExclusiveLock|t
+DROP INDEX part_drop_index_locking_idx;|part_drop_index_locking_subpart             |AccessExclusiveLock|t
+DROP INDEX part_drop_index_locking_idx;|part_drop_index_locking_subpart_child       |AccessExclusiveLock|t
+DROP INDEX part_drop_index_locking_idx;|part_drop_index_locking_subpart_child_id_idx|AccessExclusiveLock|t
+DROP INDEX part_drop_index_locking_idx;|part_drop_index_locking_subpart_id_idx      |AccessExclusiveLock|t
+(6 rows)
+
+step s2commit: COMMIT;
+
+starting permutation: s1begin s1lock s2begin s2dropsub s1select s3getlocks s1commit s3getlocks s2commit
+step s1begin: BEGIN;
+step s1lock: LOCK TABLE part_drop_index_locking_subpart_child IN ACCESS SHARE MODE;
+step s2begin: BEGIN;
+step s2dropsub: DROP INDEX part_drop_index_locking_subpart_idx; <waiting ...>
+step s1select: SELECT * FROM part_drop_index_locking_subpart_child;
+id
+--
+(0 rows)
+
+step s3getlocks:
+        SELECT s.query, c.relname, l.mode, l.granted
+        FROM pg_locks l
+                JOIN pg_class c ON l.relation = c.oid
+                JOIN pg_stat_activity s ON l.pid = s.pid
+        WHERE c.relname LIKE 'part_drop_index_locking%'
+        ORDER BY s.query, c.relname, l.mode, l.granted;
+
+query                                               |relname                                      |mode
|granted

+----------------------------------------------------+---------------------------------------------+-------------------+-------
+DROP INDEX part_drop_index_locking_subpart_idx;     |part_drop_index_locking_subpart
|AccessExclusiveLock|t      
+DROP INDEX part_drop_index_locking_subpart_idx;     |part_drop_index_locking_subpart_child
|AccessExclusiveLock|f      
+DROP INDEX part_drop_index_locking_subpart_idx;     |part_drop_index_locking_subpart_idx
|AccessExclusiveLock|t      
+SELECT * FROM part_drop_index_locking_subpart_child;|part_drop_index_locking_subpart_child        |AccessShareLock
|t      
+SELECT * FROM part_drop_index_locking_subpart_child;|part_drop_index_locking_subpart_child_id_idx |AccessShareLock
|t      
+SELECT * FROM part_drop_index_locking_subpart_child;|part_drop_index_locking_subpart_child_id_idx1|AccessShareLock
|t      
+(6 rows)
+
+step s1commit: COMMIT;
+step s2dropsub: <... completed>
+step s3getlocks:
+        SELECT s.query, c.relname, l.mode, l.granted
+        FROM pg_locks l
+                JOIN pg_class c ON l.relation = c.oid
+                JOIN pg_stat_activity s ON l.pid = s.pid
+        WHERE c.relname LIKE 'part_drop_index_locking%'
+        ORDER BY s.query, c.relname, l.mode, l.granted;
+
+query                                          |relname                                      |mode
|granted

+-----------------------------------------------+---------------------------------------------+-------------------+-------
+DROP INDEX part_drop_index_locking_subpart_idx;|part_drop_index_locking_subpart              |AccessExclusiveLock|t
  
+DROP INDEX part_drop_index_locking_subpart_idx;|part_drop_index_locking_subpart_child        |AccessExclusiveLock|t
  
+DROP INDEX part_drop_index_locking_subpart_idx;|part_drop_index_locking_subpart_child_id_idx1|AccessExclusiveLock|t
  
+DROP INDEX part_drop_index_locking_subpart_idx;|part_drop_index_locking_subpart_idx          |AccessExclusiveLock|t
  
+(4 rows)
+
+step s2commit: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 0dae483e82..8e87098150 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -90,6 +90,7 @@ test: predicate-hash
 test: predicate-gist
 test: predicate-gin
 test: partition-concurrent-attach
+test: partition-drop-index-locking
 test: partition-key-update-1
 test: partition-key-update-2
 test: partition-key-update-3
diff --git a/src/test/isolation/specs/partition-drop-index-locking.spec
b/src/test/isolation/specs/partition-drop-index-locking.spec
new file mode 100644
index 0000000000..99e7f6cb64
--- /dev/null
+++ b/src/test/isolation/specs/partition-drop-index-locking.spec
@@ -0,0 +1,44 @@
+# Verify that DROP INDEX properly locks all downward sub-partitions
+# and partitions before locking the indexes.
+
+setup
+{
+  CREATE TABLE part_drop_index_locking (id int) PARTITION BY RANGE(id);
+  CREATE TABLE part_drop_index_locking_subpart PARTITION OF part_drop_index_locking FOR VALUES FROM (1) TO (100)
PARTITIONBY RANGE(id); 
+  CREATE TABLE part_drop_index_locking_subpart_child PARTITION OF part_drop_index_locking_subpart FOR VALUES FROM (1)
TO(100); 
+  CREATE INDEX part_drop_index_locking_idx ON part_drop_index_locking(id);
+  CREATE INDEX part_drop_index_locking_subpart_idx ON part_drop_index_locking_subpart(id);
+}
+
+teardown
+{
+  DROP TABLE part_drop_index_locking;
+}
+
+session s1
+step s1begin    { BEGIN; }
+step s1lock     { LOCK TABLE part_drop_index_locking_subpart_child IN ACCESS SHARE MODE; }
+step s1select   { SELECT * FROM part_drop_index_locking_subpart_child; }
+step s1commit   { COMMIT; }
+
+session s2
+step s2begin    { BEGIN; }
+step s2drop     { DROP INDEX part_drop_index_locking_idx; }
+step s2dropsub  { DROP INDEX part_drop_index_locking_subpart_idx; }
+step s2commit   { COMMIT; }
+
+session s3
+step s3getlocks {
+        SELECT s.query, c.relname, l.mode, l.granted
+        FROM pg_locks l
+                JOIN pg_class c ON l.relation = c.oid
+                JOIN pg_stat_activity s ON l.pid = s.pid
+        WHERE c.relname LIKE 'part_drop_index_locking%'
+        ORDER BY s.query, c.relname, l.mode, l.granted;
+}
+
+# Run DROP INDEX on top partitioned table
+permutation s1begin s1lock s2begin s2drop(s1commit) s1select s3getlocks s1commit s3getlocks s2commit
+
+# Run DROP INDEX on top sub-partition table
+permutation s1begin s1lock s2begin s2dropsub(s1commit) s1select s3getlocks s1commit s3getlocks s2commit

Re: Concurrent deadlock scenario with DROP INDEX on partitioned index

From
Jimmy Yih
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hence, I propose the attached.  0001 is pure refactoring: it hopefully
> clears up the confusion about which "relkind" is which, and it also
> saves a couple of redundant syscache fetches in RemoveRelations.
> Then 0002 adds the actual bug fix as well as a test case that does
> deadlock with unpatched code.

The proposed patches look great and make much more sense. I see you've
already squashed and committed in
7b6ec86532c2ca585d671239bba867fe380448ed. Thanks!

--
Jimmy Yih (VMware)
Gaurab Dey (VMware)