Undetected Deadlock - Mailing list pgsql-general

From Michael Harris
Subject Undetected Deadlock
Date
Msg-id CADofcAWbMrvgwSMqO4iG_iD3E2v8ZUrC-_crB41my=VMM02-CA@mail.gmail.com
Whole thread Raw
Responses Re: Undetected Deadlock  (Michael Harris <harmic@gmail.com>)
Re: Undetected Deadlock  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-general
Hello Experts

I'm hoping you will be able to help me with a tricky issue.

We've recently updated our application to PG 14.1, and in the test instance we
have started to see some alarming undetected deadlocks.

An example of what we have seen is:

 locktype | database |  relation  | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction |
pid   |      mode       | granted | fastpath |           waitstart
      |            relation

----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+-----------------+---------+----------+-------------------------------+--------------------------------
 relation |   529986 | 1842228045 |      |       |            |
       |         |       |          | 165/1941408        | 2130531 |
AccessShareLock | f       | f        | 2022-01-19 00:32:32.626152+01 |
st.ctr_table_efr_oa
(1 row)

 locktype | database |  relation  | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction |
pid   |        mode         | granted | fastpath | waitstart |
   relation

----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+---------------------+---------+----------+-----------+--------------------------------
 relation |   529986 | 1842228045 |      |       |            |
       |         |       |          | 75/2193719         | 2128603 |
AccessExclusiveLock | t       | f        |           |
st.ctr_table_efr_oa
(1 row)

 locktype | database |  relation  | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction |
pid   |        mode         | granted | fastpath |           waitstart
          | relation

----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+---------------------+---------+----------+-------------------------------+-----------
 relation |   529986 | 1842231489 |      |       |            |
       |         |       |          | 75/2193719         | 2128603 |
AccessExclusiveLock | f       | f        | 2022-01-19
00:32:32.924694+01 | st.tpd_oa
(1 row)

   locktype    | database |  relation  | page | tuple |  virtualxid  |
transactionid | classid |   objid   | objsubid | virtualtransaction |
 pid   |         mode          | granted | fastpath |
waitstart           | relation

---------------+----------+------------+------+-------+--------------+---------------+---------+-----------+----------+--------------------+---------+-----------------------+---------+----------+-------------------------------+-----------
 relation      |   529986 | 1842231489 |      |       |              |
              |         |           |          | 165/1941408        |
2130531 | AccessShareLock       | t       | f        |
              | st.tpd_oa

So:
  pid 2130531 waits for an AccessShareLock on relation 1842228045,
blocked by pid 2128603 which holds an AccessExclusiveLock
  pid 2128603 waits for an AccessExclusiveLock on relation 1842231489,
blocked by pid 2130531 which holds an AccessShareLock

The queries being executed by these backends are:

   pid   |          query_start          |         state_change
  | wait_event_type | wait_event | state  |
      query

---------+-------------------------------+-------------------------------+-----------------+------------+--------+-------------------------------------------------------------------------
 2128603 | 2022-01-19 00:32:32.924413+01 | 2022-01-19
00:32:32.924413+01 | Lock            | relation   | active | DROP
TABLE st.tpd_oa_18929
 2130531 | 2022-01-19 00:32:32.625706+01 | 2022-01-19
00:32:32.625708+01 | Lock            | relation   | active | DELETE
FROM st.ctr_table_efr_oa_19010 WHERE ropid = 44788868
(2 rows)

Both of these processes had been waiting for nearly 2 hours when I
took the above printouts. My understanding of deadlock detection is
that when a backend is waiting for a lock, it runs a deadlock
detection function once a second so it should have been detected
straight away but somehow it is not.

Are there any deadlock conditions that postgresql is not able to detect?

Note that there were a lot of other processes also waiting on relation
1842231489 - could that be confusing the deadlock detection routine?

I am also confused about the locks which are being taken out by the
DELETE query. Process 2130531 is trying to delete rows from a
partition `st.ctr_table_efr_oa_19010`, and to do so it is trying to
get an AccessShareLock on the parent table `st.ctr_table_efr_oa`. I
don't really understand why that is, and in fact if I try to reproduce
the deadlock manually by executing the relevant SQL in psql, the
DELETE never tries to take out such a lock and no deadlock occurs.

What are the circumstances where deleting rows directly from a
partition will cause it to take a lock on the partitioned table?

Any suggestions for debugging this issue would be most welcome.

Thanks in advance

Cheers
Mike



pgsql-general by date:

Previous
From: "huangning290@yahoo.com"
Date:
Subject: GIN index
Next
From: Michael Harris
Date:
Subject: Re: Undetected Deadlock