Re: 024_add_drop_pub.pl might fail due to deadlock - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: 024_add_drop_pub.pl might fail due to deadlock
Date
Msg-id CAA4eK1+Lu0gmt-8x2y5SYTA3tnii+RjuMqQoN64xjmwnP8w0Fw@mail.gmail.com
Whole thread Raw
In response to Re: 024_add_drop_pub.pl might fail due to deadlock  (Ajin Cherian <itsajin@gmail.com>)
List pgsql-hackers
On Mon, Jul 21, 2025 at 6:59 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
> Yes, this is correct. I have also verified this in my testing that
> when there is a second subscription, a deadlock can still happen with
> my previous patch. I have now modified the patch in tablesync worker
> to take locks on both SubscriptionRelationId and
> SubscriptionRelRelationId prior to taking lock on
> ReplicationOriginRelationId. I have also found that there is a similar
> problem in DropSubscription() where lock on SubscriptionRelRelationId
> is not taken before dropping the tracking origin. I've also modified
> the signature of UpdateSubscriptionRelState to take a bool
> "lock_needed" which if false, the SubscriptionRelationId is not
> locked. I've made a new version of the patch on PG_15.
>

Review comments:
================
1.
+ if (!rel)
+ rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);

Why did you acquire AccessExclusiveLock here when the current code has
RowExclusiveLock? It should be RowExclusiveLock.

2.
+ *
+ * Lock SubscriptionRelationId with AccessShareLock and
+ * take AccessExclusiveLock on SubscriptionRelRelationId to
+ * prevent any deadlocks with user concurrently performing
+ * refresh on the subscription.
  */

Try to tell in the comments that we want to keep the locking order
same as DDL commands to prevent deadlocks.

3.
+ * Also close any tables prior to the commit.
  */
+ if (rel)
+ {
+ table_close(rel, AccessExclusiveLock);
+ rel = NULL;

We don't need to explicitly release lock on table_close, it will be
done at transaction end, so use NoLock here as we do in current HEAD
code.

4.
DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
 {
- Relation rel;
+ Relation rel, sub_rel;
  ObjectAddress myself;
  HeapTuple tup;
  Oid subid;
@@ -1460,7 +1460,13 @@ DropSubscription(DropSubscriptionStmt *stmt,
bool isTopLevel)
  * Note that the state can't change because we have already stopped both
  * the apply and tablesync workers and they can't restart because of
  * exclusive lock on the subscription.
+ *
+ * Lock pg_subscription_rel with AccessExclusiveLock to prevent any
+ * deadlock with apply workers of other subscriptions trying
+ * to drop tracking origin.
  */
+ sub_rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);

I don't think we need AccessExclusiveLock on SubscriptionRelRelationId
in DropSubscription. Kindly test again after fixing the first comment
above.
--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: Verify predefined LWLocks tranches have entries in wait_event_names.txt
Next
From: shveta malik
Date:
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart