RE: 024_add_drop_pub.pl might fail due to deadlock - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: 024_add_drop_pub.pl might fail due to deadlock
Date
Msg-id OSCPR01MB14966A3AA2A7823E8E52E0D54F55FA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: 024_add_drop_pub.pl might fail due to deadlock  (Ajin Cherian <itsajin@gmail.com>)
Responses Re: 024_add_drop_pub.pl might fail due to deadlock
RE: 024_add_drop_pub.pl might fail due to deadlock
Re: 024_add_drop_pub.pl might fail due to deadlock
List pgsql-hackers
Dear Ajin,

Thanks for the patch. Firstly let me confirm my understanding. While altering the
subscription, locks are acquired with below ordering:

Order    target                        level
1        pg_subscription                row exclusive
2        pg_subscription, my tuple    access exclusive
3        pg_subscription_rel            access exclusive
4        pg_replication_origin        excluive

In contrast, apply worker works like:

Order    target                        level
1        pg_replication_origin        excluive
2        pg_subscription, my tuple    access share
3        pg_subscrition_rel            row exclusive

Thus a backend may wait at step 4, and apply worker can stuck at step 2 or 3.

Below are my comments:

```
@@ -340,7 +341,6 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
                 * is dropped. So passing missing_ok = false.
                 */
                ReplicationSlotDropAtPubNode(LogRepWorkerWalRcvConn, syncslotname, false);
-
```
This change is not needed.

```
+                               if (!rel)
+                                       rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
+
```

I feel it is too strong, isn't it enough to use row exclusive as initially used?

```
 UpdateSubscriptionRelState(Oid subid, Oid relid, char state,
-                                                  XLogRecPtr sublsn)
+                                                  XLogRecPtr sublsn, bool lock_needed)
```

I feel the name `lock_needed` is bit misleading, because the function still opens
the pg_subscription_rel catalog with row exclusive. How about "lock_shared_object"?

```
@@ -1460,7 +1460,13 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
         * Note that the state can't change because we have already stopped both
         * the apply and tablesync workers and they can't restart because of
         * exclusive lock on the subscription.
+        *
+        * Lock pg_subscription_rel with AccessExclusiveLock to prevent any
+        * deadlock with apply workers of other subscriptions trying
+        * to drop tracking origin.
         */
+       sub_rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
```

Hmm. Per my understanding, DropSubscription acquires below locks till it reaches
replorigin_drop_by_name().

Order    target                        level
1        pg_subscription                access exclusive
2        pg_subscription, my tuple    access exclusive
3        pg_replication_origin        excluive

IIUC we must preserve the ordering, not the target of locks.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


pgsql-hackers by date:

Previous
From: Álvaro Herrera
Date:
Subject: Re: RecoveryTargetAction is left out in xlog_interna.h
Next
From: Amit Kapila
Date:
Subject: Re: 024_add_drop_pub.pl might fail due to deadlock