tablecmds: Deadlock caused by Attach Partition - Mailing list pgsql-hackers

From Chao Li
Subject tablecmds: Deadlock caused by Attach Partition
Date
Msg-id CFACA0EB-7E6F-4FAA-9ACE-1FC2226D7482@gmail.com
Whole thread Raw
List pgsql-hackers
Hi Hacker,

This issue was initially discovered while working on patch [1], which makes REPLICA IDENTITY recurse from a partitioned
tabledown to all partitions. The deadlock test itself was originally designed by Dmitry Dolgov (see [2]). After further
investigation,I confirmed that this deadlock is actually a more general problem and can be reliably reproduced on the
currentmaster branch. 

How to reproduce?
==============

1. Edit the attached script with changing the db name to your own, then run the attached sh script in a terminal.
Basicallythe script creates a deep partition chain: test1<-test2<-test3<-… <-test1000. 
2. In the other thermal, start a psql session, and repeatedly run “ALTER TABLE test1 ALTER COLUMN id SET DEFAULT 0;”,
thenyou should quickly see a deadlock error reported in the server log. 

My analysis
=========

Let’s first look at the server log:
```
2026-01-29 16:36:14.308 CST [50910] DETAIL: Process 50910 waits for AccessExclusiveLock on relation 38382 of database
16384;blocked by process 69148. 
Process 69148 waits for AccessShareLock on relation 38376 of database 16384; blocked by process 50910.
Process 50910: ALTER TABLE test1 ALTER COLUMN id SET DEFAULT 0;
Process 69148: alter table test334 attach partition test335 for values from (335) to (1000000000);
```

Here relation 38376 was table test29, and relation 38382 was table test30. In other words:

 * ALTER TABLE (let’s call it session A) was waiting for a lock on table30,
 * ATTACH PARTITION (let’s call it session B) was waiting for a lock on test29.

At the time:

  * session A held lock on test29 and requesting lock on test30;
  * session B held lock on test30 and requesting lock on test29;

This circular wait leads directly to the deadlock. I then analyzed the lock behavior of both sessions.

For Session A: ALTER COLUMN SET DEFAULT:

  * ALTER TABLE aways locks the target table, so test1 locked first
  * Since SET DEFAULT recurses to partitions, ATSimpleRecursion() is used to propagate the command. During this
process,find_all_inheritors() is called to locate all partitions, sort them by OID, and then lock them one by one. 
  * As a result, locks are acquired in the order test2, test3, … up to the last existing partition.

This explains why session A held a lock on test29 and was requesting a lock on test30.

From Session B: ATTACH PARTITION:

Session B was requesting an AccessShareLock. Based on this, I traced the lock request to generate_partition_qual(),
whichis a recursive function. It walks up the partition tree to collect all partition quals, meaning that ATTACH
PARTITIONlocks all ancestors in reverse OID order. 

That explains why session B held a lock on test30 and was requesting a lock on test29.

Proposed solution
==============

The root cause is inconsistent lock ordering.

A simple solution is to pre-lock all ancestors for ATTACH PARTITION in OID order. That way, when
generate_partition_qual()later recurses up the partition tree, all required locks have already been acquired. 

One important detail is that these ancestor locks must be taken before the ALTER TABLE infrastructure locks the target
table;otherwise, the lock order still cannot be guaranteed. With this in mind, I found that
RangeVarCallbackForAlterRelation()is the appropriate place to perform this pre-locking. 

Please see the attached v1 patch for the implementation.

Potential issue remaining
===================

Currently, find_all_inheritors() sorts all inheritors purely by OID. In most cases this works, because users typically
createa partitioned table first and then add partitions later, so the partitioned table’s OID is smaller than those of
itspartitions. 

However, this is not always true. For example, if a user creates a regular table A first, then creates a partitioned
tableP, and later attaches A to P, P’s OID will be larger than A’s. If such a case is reported in the future, I think
weshould enhance find_all_inheritors() to always lock the parent first, and only sort siblings at the same level by
OID.

This patch has not addressed that potential issue.


[1] https://postgr.es/m/CAEoWx2nJ71hy8R614HQr7vQhkBReO9AANPODPg0aSQs74eOdLQ@mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/201902041630.gpadougzab7v%40alvherre.pgsql

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/




Attachment

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery
Next
From: Chao Li
Date:
Subject: Re: Document NULL