Re: Undetected Deadlock - Mailing list pgsql-general

From Michael Harris
Subject Re: Undetected Deadlock
Date
Msg-id CADofcAXak8nwRYXjwvXu_0ypbZaae=YU_VgTQ2QLfWwGMAOWCw@mail.gmail.com
Whole thread Raw
In response to Undetected Deadlock  (Michael Harris <harmic@gmail.com>)
Responses Re: Undetected Deadlock  (Rob Sargent <robjsargent@gmail.com>)
List pgsql-general
My apologies,

After posting this and looking at how it appears I realised that line
wrapping makes the tables totally illegible.

Here they are again with all unnecessary columns removed and others shortened.

 locktype | database |  relation  |   pid   |        mode         |
granted |           waitstart           |     relation

----------+----------+------------+---------+---------------------+---------+-------------------------------+--------------------
 relation |   529986 | 1842228045 | 2130531 | AccessShareLock     | f
     | 2022-01-19 00:32:32.626152+01 | st.ctr_table_efr_oa
 relation |   529986 | 1842228045 | 2128603 | AccessExclusiveLock | t
     |                               | st.ctr_table_efr_oa
 relation |   529986 | 1842231489 | 2128603 | AccessExclusiveLock | f
     | 2022-01-19 00:32:32.924694+01 | st.tpd_oa
 relation |   529986 | 1842231489 | 2130531 | AccessShareLock     | t
     |                               | st.tpd_oa


   pid   |    query_start     |   state_change     | wait_event_type |
wait_event |                       query

---------+--------------------+--------------------+-----------------+------------+-------------------------------------------------------------
 2128603 | 00:32:32.924413+01 | 00:32:32.924413+01 | Lock            |
relation   | DROP TABLE st.tpd_oa_18929
 2130531 | 00:32:32.625706+01 | 00:32:32.625708+01 | Lock            |
relation   | DELETE FROM st.ctr_table_efr_oa_19010 WHERE ropid =
44788868

I hope this version is easier to read.

Cheers
Mike
On Tue, 25 Jan 2022 at 15:49, Michael Harris <harmic@gmail.com> wrote:
>
> Hello Experts
>
> I'm hoping you will be able to help me with a tricky issue.
>
> We've recently updated our application to PG 14.1, and in the test instance we
> have started to see some alarming undetected deadlocks.
>
> An example of what we have seen is:
>
>  locktype | database |  relation  | page | tuple | virtualxid |
> transactionid | classid | objid | objsubid | virtualtransaction |
> pid   |      mode       | granted | fastpath |           waitstart
>       |            relation
>
----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+-----------------+---------+----------+-------------------------------+--------------------------------
>  relation |   529986 | 1842228045 |      |       |            |
>        |         |       |          | 165/1941408        | 2130531 |
> AccessShareLock | f       | f        | 2022-01-19 00:32:32.626152+01 |
> st.ctr_table_efr_oa
> (1 row)
>
>  locktype | database |  relation  | page | tuple | virtualxid |
> transactionid | classid | objid | objsubid | virtualtransaction |
> pid   |        mode         | granted | fastpath | waitstart |
>    relation
>
----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+---------------------+---------+----------+-----------+--------------------------------
>  relation |   529986 | 1842228045 |      |       |            |
>        |         |       |          | 75/2193719         | 2128603 |
> AccessExclusiveLock | t       | f        |           |
> st.ctr_table_efr_oa
> (1 row)
>
>  locktype | database |  relation  | page | tuple | virtualxid |
> transactionid | classid | objid | objsubid | virtualtransaction |
> pid   |        mode         | granted | fastpath |           waitstart
>           | relation
>
----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+---------------------+---------+----------+-------------------------------+-----------
>  relation |   529986 | 1842231489 |      |       |            |
>        |         |       |          | 75/2193719         | 2128603 |
> AccessExclusiveLock | f       | f        | 2022-01-19
> 00:32:32.924694+01 | st.tpd_oa
> (1 row)
>
>    locktype    | database |  relation  | page | tuple |  virtualxid  |
> transactionid | classid |   objid   | objsubid | virtualtransaction |
>  pid   |         mode          | granted | fastpath |
> waitstart           | relation
>
---------------+----------+------------+------+-------+--------------+---------------+---------+-----------+----------+--------------------+---------+-----------------------+---------+----------+-------------------------------+-----------
>  relation      |   529986 | 1842231489 |      |       |              |
>               |         |           |          | 165/1941408        |
> 2130531 | AccessShareLock       | t       | f        |
>               | st.tpd_oa
>
> So:
>   pid 2130531 waits for an AccessShareLock on relation 1842228045,
> blocked by pid 2128603 which holds an AccessExclusiveLock
>   pid 2128603 waits for an AccessExclusiveLock on relation 1842231489,
> blocked by pid 2130531 which holds an AccessShareLock
>
> The queries being executed by these backends are:
>
>    pid   |          query_start          |         state_change
>   | wait_event_type | wait_event | state  |
>       query
>
---------+-------------------------------+-------------------------------+-----------------+------------+--------+-------------------------------------------------------------------------
>  2128603 | 2022-01-19 00:32:32.924413+01 | 2022-01-19
> 00:32:32.924413+01 | Lock            | relation   | active | DROP
> TABLE st.tpd_oa_18929
>  2130531 | 2022-01-19 00:32:32.625706+01 | 2022-01-19
> 00:32:32.625708+01 | Lock            | relation   | active | DELETE
> FROM st.ctr_table_efr_oa_19010 WHERE ropid = 44788868
> (2 rows)
>
> Both of these processes had been waiting for nearly 2 hours when I
> took the above printouts. My understanding of deadlock detection is
> that when a backend is waiting for a lock, it runs a deadlock
> detection function once a second so it should have been detected
> straight away but somehow it is not.
>
> Are there any deadlock conditions that postgresql is not able to detect?
>
> Note that there were a lot of other processes also waiting on relation
> 1842231489 - could that be confusing the deadlock detection routine?
>
> I am also confused about the locks which are being taken out by the
> DELETE query. Process 2130531 is trying to delete rows from a
> partition `st.ctr_table_efr_oa_19010`, and to do so it is trying to
> get an AccessShareLock on the parent table `st.ctr_table_efr_oa`. I
> don't really understand why that is, and in fact if I try to reproduce
> the deadlock manually by executing the relevant SQL in psql, the
> DELETE never tries to take out such a lock and no deadlock occurs.
>
> What are the circumstances where deleting rows directly from a
> partition will cause it to take a lock on the partitioned table?
>
> Any suggestions for debugging this issue would be most welcome.
>
> Thanks in advance
>
> Cheers
> Mike



pgsql-general by date:

Previous
From: Michael Harris
Date:
Subject: Undetected Deadlock
Next
From: Rob Sargent
Date:
Subject: Re: Undetected Deadlock