Re: Strange deadlock with object/target of lock : transaction - Mailing list pgsql-general

From Achilleas Mantzios
Subject Re: Strange deadlock with object/target of lock : transaction
Date
Msg-id aee6efad-aded-48cb-91e3-b6b502910c6e@cloud.gatewaynet.com
Whole thread Raw
In response to Re: Strange deadlock with object/target of lock : transaction  (Achilleas Mantzios <a.mantzios@cloud.gatewaynet.com>)
List pgsql-general

On 8/25/25 17:11, Achilleas Mantzios wrote:


On 25/8/25 17:58, Adrian Klaver wrote:
On 8/25/25 07:40, Achilleas Mantzios wrote:
On 8/20/25 14:59, Achilleas Mantzios wrote:

On 8/14/25 16:01, Achilleas Mantzios wrote:

Hi Adrian

On 8/14/25 15:39, Adrian Klaver wrote:

On 8/14/25 00:07, Achilleas Mantzios wrote:
Hi All

We've been hit by a weird deadlock which it took me some days to isolate and replicate. It does not have to do with order of updates or any explicit TABLE-level locking, the objects/targets of the deadlock in question are transactions.

First off, I maybe wrong with the above conclusion, I noticed that

Hi I reproduced without the triggers, I understood the problem, I believe the system's behavior is the intended, I am sorry for the false alarm. The thing is that it takes >=3 transactions to happen . That was the tricky part, up to now in all cases of deadlocks we had two transactions involved, this one needed three or more.

For folks that run across this thread what was the issue?
Inconsistent order of updates. The two pieces of code , the update piece and the insert piece, used inconsistent order of updates. However this could not be manifested with one xaction of the update-type and one of the insert-type, there had to be more than one transactions of the update-type doing the same update (usually caused by users hitting the reload button after 1 or 2 seconds). I can easily prepare a test case, schema, data, commands for anyone interested.

Hi, trying to create a test case, I think I am up to something. I attach the test table and data.

How to reproduce. We have the "insert" code and the "update" code. The "update" code runs :

with tempqry as (select year,doc_no from test_deadlock where delivered_at::date = current_date 

and not sign_list order by bank_name,management_company_name,beneficiary_name,year,doc_no ) 

UPDATE test_deadlock pf set sign_list = true FROM tempqry 

WHERE pf.year = tempqry.year and pf.doc_no = tempqry.doc_no ;

The  insert code runs :

select COALESCE(max(id),0)+1  FROM test_deadlock;

INSERT INTO test_deadlock (id,year, doc_no, inserted_at, bank_name, management_company_name,beneficiary_name, currency, delivered_at, sign_list) VALUES (333024,2025, 395303, '2025-08-26 01:00:00+03', 'CRRRRRRRRRRR', '0006-D', 'AUTAUTAUTAUTAUTAUTAUTAUTAUTAUT', 'EUR', '2025-08-26 11:55:28.359485+03', false);

DO $$

DECLARE REC record; 

BEGIN   

      FOR REC IN SELECT year,doc_no FROM test_deadlock WHERE NOT sign_list ORDER BY    

             bank_name,management_company_name,beneficiary_name,year,doc_no LOOP 

         UPDATE test_deadlock SET currency=currency WHERE year=REC.year AND doc_no=REC.doc_no; 

      END LOOP; 

END$$
;

As you can notice, the order of the updates in both the "update" code and the "insert" code is consistent (same order by bank_name,management_company_name,beneficiary_name,year,doc_no) and one would expect there would no deadlock .

Now I open 3 sessions , 1, 2 and 3 :

Session 1, running the "update" code :

amantzio@[local]/dynacom=# begin; select txid_current(), pg_backend_pid() ;  with tempqry as (select year,doc_no from test_deadlock where delivered_at::date = current_date and not sign_lis
t order by bank_name,management_company_name,beneficiary_name,year,doc_no ) UPDATE test_deadlock pf set sign_list = true FROM tempqry WHERE pf.year = tempqry.year and pf.doc_no = tempqry.d
oc_no ;
BEGIN
txid_current | pg_backend_pid  
--------------+----------------
   117290416 |           2336
(1 row)

UPDATE 6
amantzio@[local]/dynacom=*#

Session 2, like session 1, also "update" :

amantzio@[local]/dynacom=# begin; select txid_current(), pg_backend_pid() ;  with tempqry as (select year,doc_no from test_deadlock where delivered_at::date = current_date and not sign_lis
t order by bank_name,management_company_name,beneficiary_name,year,doc_no ) UPDATE test_deadlock pf set sign_list = true FROM tempqry WHERE pf.year = tempqry.year and pf.doc_no = tempqry.d
oc_no ;
BEGIN
txid_current | pg_backend_pid  
--------------+----------------
   117290418 |           2056
(1 row)

(waits for session 1's xaction to finish , as expected )

Session 3, running the "insert" code :

begin ; select txid_current() , pg_backend_pid()  ; select COALESCE(max(id),0)+1  FROM test_deadlock; INSERT INTO test_deadlock (id,year, doc_no, inserted_at, ba
nk_name, management_company_name,beneficiary_name, currency, delivered_at, sign_list) VALUES (333024,2025, 395303, '2025-08-26 01:00:00+03', 'CRRRRRRRRRRR', '0006-D', 'AUTAUTAUTAUTAUTAUTAU
TAUTAUTAUT', 'EUR', '2025-08-26 11:55:28.359485+03', false); DO $$DECLARE REC record; BEGIN
FOR REC IN SELECT year,doc_no FROM test_deadlock WHERE NOT sign_list ORDER BY bank_name,management_company_name,beneficiary_name,year,doc_no LOOP
UPDATE test_deadlock SET currency=currency WHERE year=REC.year AND doc_no=REC.doc_no;
END LOOP;
END$$
;
BEGIN
txid_current | pg_backend_pid  
--------------+----------------
   117290419 |           2039
(1 row)

?column?  
----------
  333023
(1 row)

INSERT 0 1

(waits ....)

Now I return to session 1 and give :

amantzio@[local]/dynacom=*# rollback ;
ROLLBACK
amantzio@[local]/dynacom=#

Immediately after , session 3 executes the anonymous block, but session 2 :

ERROR:  deadlock detected
DETAIL:  Process 2056 waits for ShareLock on transaction 117290419; blocked by process 2039.
Process 2039 waits for ShareLock on transaction 117290418; blocked by process 2056.
HINT:  See server log for query details.
CONTEXT:  while updating tuple (8,33) in relation "test_deadlock"
amantzio@[local]/dynacom=!#

If I update the "insert" code to use a CTE instead of the anonymous block in session 3, while Session 1 and 2 commands ("update" code) remain unchanged  ,

session 1 (re-run as above)

session 2 (re-run as above)

session 3 :

postgres@[local]/dynacom=# begin ; select txid_current() , pg_backend_pid()  ; select COALESCE(max(id),0)+1  FROM test_deadlock; INSERT INTO test_deadlock (id,year, doc_no, inserted_at, ba
nk_name, management_company_name,beneficiary_name, currency, delivered_at, sign_list) VALUES (333024,2025, 395303, '2025-08-26 01:00:00+03', 'CREDIT SUISSE AG', '0006-DYNACOM TANKERS MANAG
EMENT LTD', 'AUTUMN SHIPPING SERVICES LIMITED', 'EUR', '2025-08-26 11:55:28.359485+03', false); with tempqrybig as (select year,doc_no from test_deadlock where not sign_list order by bank_
name,management_company_name,beneficiary_name,year,doc_no) UPDATE test_deadlock pf set currency=currency FROM tempqrybig WHERE pf.year = tempqrybig.year and pf.doc_no = tempqrybig.doc_no ;
BEGIN
txid_current | pg_backend_pid  
--------------+----------------
   117290454 |           2039
(1 row)

?column?  
----------
  333023
(1 row)

INSERT 0 1

(waits)

then I go to Session 1, give rollback, session 2 completes, then I rollback session 2, session 3 completes and finally I rollback session 3, without any deadlock.

Another finding , if I keep the anonymous block , but add a primary key on (year, doc_no)

alter table test_deadlock ADD CONSTRAINT test_deadlock_pk PRIMARY KEY (year, doc_no);

and repeat the initial version of the 3 sessions, again the deadlock goes away.

All the above with 18beta3 .

So, I am not so sure this should happen, since there is no obvious inconsistent order of updates. You may reproduce or ask me to run again the tests while observing pg_locks or anything else.

Thank you.



Attachment

pgsql-general by date:

Previous
From: xx Z
Date:
Subject: Feature request: A method to configure client-side TLS ciphers for streaming replication
Next
From: xx Z
Date:
Subject: How to configure client-side TLS ciphers for streaming replication?