Thread: Undetected Deadlock

Undetected Deadlock

From
Michael Harris
Date:
Hello Experts

I'm hoping you will be able to help me with a tricky issue.

We've recently updated our application to PG 14.1, and in the test instance we
have started to see some alarming undetected deadlocks.

An example of what we have seen is:

 locktype | database |  relation  | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction |
pid   |      mode       | granted | fastpath |           waitstart
      |            relation

----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+-----------------+---------+----------+-------------------------------+--------------------------------
 relation |   529986 | 1842228045 |      |       |            |
       |         |       |          | 165/1941408        | 2130531 |
AccessShareLock | f       | f        | 2022-01-19 00:32:32.626152+01 |
st.ctr_table_efr_oa
(1 row)

 locktype | database |  relation  | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction |
pid   |        mode         | granted | fastpath | waitstart |
   relation

----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+---------------------+---------+----------+-----------+--------------------------------
 relation |   529986 | 1842228045 |      |       |            |
       |         |       |          | 75/2193719         | 2128603 |
AccessExclusiveLock | t       | f        |           |
st.ctr_table_efr_oa
(1 row)

 locktype | database |  relation  | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction |
pid   |        mode         | granted | fastpath |           waitstart
          | relation

----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+---------------------+---------+----------+-------------------------------+-----------
 relation |   529986 | 1842231489 |      |       |            |
       |         |       |          | 75/2193719         | 2128603 |
AccessExclusiveLock | f       | f        | 2022-01-19
00:32:32.924694+01 | st.tpd_oa
(1 row)

   locktype    | database |  relation  | page | tuple |  virtualxid  |
transactionid | classid |   objid   | objsubid | virtualtransaction |
 pid   |         mode          | granted | fastpath |
waitstart           | relation

---------------+----------+------------+------+-------+--------------+---------------+---------+-----------+----------+--------------------+---------+-----------------------+---------+----------+-------------------------------+-----------
 relation      |   529986 | 1842231489 |      |       |              |
              |         |           |          | 165/1941408        |
2130531 | AccessShareLock       | t       | f        |
              | st.tpd_oa

So:
  pid 2130531 waits for an AccessShareLock on relation 1842228045,
blocked by pid 2128603 which holds an AccessExclusiveLock
  pid 2128603 waits for an AccessExclusiveLock on relation 1842231489,
blocked by pid 2130531 which holds an AccessShareLock

The queries being executed by these backends are:

   pid   |          query_start          |         state_change
  | wait_event_type | wait_event | state  |
      query

---------+-------------------------------+-------------------------------+-----------------+------------+--------+-------------------------------------------------------------------------
 2128603 | 2022-01-19 00:32:32.924413+01 | 2022-01-19
00:32:32.924413+01 | Lock            | relation   | active | DROP
TABLE st.tpd_oa_18929
 2130531 | 2022-01-19 00:32:32.625706+01 | 2022-01-19
00:32:32.625708+01 | Lock            | relation   | active | DELETE
FROM st.ctr_table_efr_oa_19010 WHERE ropid = 44788868
(2 rows)

Both of these processes had been waiting for nearly 2 hours when I
took the above printouts. My understanding of deadlock detection is
that when a backend is waiting for a lock, it runs a deadlock
detection function once a second so it should have been detected
straight away but somehow it is not.

Are there any deadlock conditions that postgresql is not able to detect?

Note that there were a lot of other processes also waiting on relation
1842231489 - could that be confusing the deadlock detection routine?

I am also confused about the locks which are being taken out by the
DELETE query. Process 2130531 is trying to delete rows from a
partition `st.ctr_table_efr_oa_19010`, and to do so it is trying to
get an AccessShareLock on the parent table `st.ctr_table_efr_oa`. I
don't really understand why that is, and in fact if I try to reproduce
the deadlock manually by executing the relevant SQL in psql, the
DELETE never tries to take out such a lock and no deadlock occurs.

What are the circumstances where deleting rows directly from a
partition will cause it to take a lock on the partitioned table?

Any suggestions for debugging this issue would be most welcome.

Thanks in advance

Cheers
Mike



Re: Undetected Deadlock

From
Michael Harris
Date:
My apologies,

After posting this and looking at how it appears I realised that line
wrapping makes the tables totally illegible.

Here they are again with all unnecessary columns removed and others shortened.

 locktype | database |  relation  |   pid   |        mode         |
granted |           waitstart           |     relation

----------+----------+------------+---------+---------------------+---------+-------------------------------+--------------------
 relation |   529986 | 1842228045 | 2130531 | AccessShareLock     | f
     | 2022-01-19 00:32:32.626152+01 | st.ctr_table_efr_oa
 relation |   529986 | 1842228045 | 2128603 | AccessExclusiveLock | t
     |                               | st.ctr_table_efr_oa
 relation |   529986 | 1842231489 | 2128603 | AccessExclusiveLock | f
     | 2022-01-19 00:32:32.924694+01 | st.tpd_oa
 relation |   529986 | 1842231489 | 2130531 | AccessShareLock     | t
     |                               | st.tpd_oa


   pid   |    query_start     |   state_change     | wait_event_type |
wait_event |                       query

---------+--------------------+--------------------+-----------------+------------+-------------------------------------------------------------
 2128603 | 00:32:32.924413+01 | 00:32:32.924413+01 | Lock            |
relation   | DROP TABLE st.tpd_oa_18929
 2130531 | 00:32:32.625706+01 | 00:32:32.625708+01 | Lock            |
relation   | DELETE FROM st.ctr_table_efr_oa_19010 WHERE ropid =
44788868

I hope this version is easier to read.

Cheers
Mike
On Tue, 25 Jan 2022 at 15:49, Michael Harris <harmic@gmail.com> wrote:
>
> Hello Experts
>
> I'm hoping you will be able to help me with a tricky issue.
>
> We've recently updated our application to PG 14.1, and in the test instance we
> have started to see some alarming undetected deadlocks.
>
> An example of what we have seen is:
>
>  locktype | database |  relation  | page | tuple | virtualxid |
> transactionid | classid | objid | objsubid | virtualtransaction |
> pid   |      mode       | granted | fastpath |           waitstart
>       |            relation
>
----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+-----------------+---------+----------+-------------------------------+--------------------------------
>  relation |   529986 | 1842228045 |      |       |            |
>        |         |       |          | 165/1941408        | 2130531 |
> AccessShareLock | f       | f        | 2022-01-19 00:32:32.626152+01 |
> st.ctr_table_efr_oa
> (1 row)
>
>  locktype | database |  relation  | page | tuple | virtualxid |
> transactionid | classid | objid | objsubid | virtualtransaction |
> pid   |        mode         | granted | fastpath | waitstart |
>    relation
>
----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+---------------------+---------+----------+-----------+--------------------------------
>  relation |   529986 | 1842228045 |      |       |            |
>        |         |       |          | 75/2193719         | 2128603 |
> AccessExclusiveLock | t       | f        |           |
> st.ctr_table_efr_oa
> (1 row)
>
>  locktype | database |  relation  | page | tuple | virtualxid |
> transactionid | classid | objid | objsubid | virtualtransaction |
> pid   |        mode         | granted | fastpath |           waitstart
>           | relation
>
----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+---------------------+---------+----------+-------------------------------+-----------
>  relation |   529986 | 1842231489 |      |       |            |
>        |         |       |          | 75/2193719         | 2128603 |
> AccessExclusiveLock | f       | f        | 2022-01-19
> 00:32:32.924694+01 | st.tpd_oa
> (1 row)
>
>    locktype    | database |  relation  | page | tuple |  virtualxid  |
> transactionid | classid |   objid   | objsubid | virtualtransaction |
>  pid   |         mode          | granted | fastpath |
> waitstart           | relation
>
---------------+----------+------------+------+-------+--------------+---------------+---------+-----------+----------+--------------------+---------+-----------------------+---------+----------+-------------------------------+-----------
>  relation      |   529986 | 1842231489 |      |       |              |
>               |         |           |          | 165/1941408        |
> 2130531 | AccessShareLock       | t       | f        |
>               | st.tpd_oa
>
> So:
>   pid 2130531 waits for an AccessShareLock on relation 1842228045,
> blocked by pid 2128603 which holds an AccessExclusiveLock
>   pid 2128603 waits for an AccessExclusiveLock on relation 1842231489,
> blocked by pid 2130531 which holds an AccessShareLock
>
> The queries being executed by these backends are:
>
>    pid   |          query_start          |         state_change
>   | wait_event_type | wait_event | state  |
>       query
>
---------+-------------------------------+-------------------------------+-----------------+------------+--------+-------------------------------------------------------------------------
>  2128603 | 2022-01-19 00:32:32.924413+01 | 2022-01-19
> 00:32:32.924413+01 | Lock            | relation   | active | DROP
> TABLE st.tpd_oa_18929
>  2130531 | 2022-01-19 00:32:32.625706+01 | 2022-01-19
> 00:32:32.625708+01 | Lock            | relation   | active | DELETE
> FROM st.ctr_table_efr_oa_19010 WHERE ropid = 44788868
> (2 rows)
>
> Both of these processes had been waiting for nearly 2 hours when I
> took the above printouts. My understanding of deadlock detection is
> that when a backend is waiting for a lock, it runs a deadlock
> detection function once a second so it should have been detected
> straight away but somehow it is not.
>
> Are there any deadlock conditions that postgresql is not able to detect?
>
> Note that there were a lot of other processes also waiting on relation
> 1842231489 - could that be confusing the deadlock detection routine?
>
> I am also confused about the locks which are being taken out by the
> DELETE query. Process 2130531 is trying to delete rows from a
> partition `st.ctr_table_efr_oa_19010`, and to do so it is trying to
> get an AccessShareLock on the parent table `st.ctr_table_efr_oa`. I
> don't really understand why that is, and in fact if I try to reproduce
> the deadlock manually by executing the relevant SQL in psql, the
> DELETE never tries to take out such a lock and no deadlock occurs.
>
> What are the circumstances where deleting rows directly from a
> partition will cause it to take a lock on the partitioned table?
>
> Any suggestions for debugging this issue would be most welcome.
>
> Thanks in advance
>
> Cheers
> Mike



Re: Undetected Deadlock

From
Rob Sargent
Date:

> On Jan 24, 2022, at 10:02 PM, Michael Harris <harmic@gmail.com> wrote:
>
> My apologies,
>
> After posting this and looking at how it appears I realised that line
> wrapping makes the tables totally illegible.
>
> Here they are again with all unnecessary columns removed and others shortened.
>
> locktype | database |  relation  |   pid   |        mode         |
> granted |           waitstart           |     relation
>
----------+----------+------------+---------+---------------------+---------+-------------------------------+--------------------
> relation |   529986 | 1842228045 | 2130531 | AccessShareLock     | f
>     | 2022-01-19 00:32:32.626152+01 | st.ctr_table_efr_oa
> relation |   529986 | 1842228045 | 2128603 | AccessExclusiveLock | t
>     |                               | st.ctr_table_efr_oa
> relation |   529986 | 1842231489 | 2128603 | AccessExclusiveLock | f
>     | 2022-01-19 00:32:32.924694+01 | st.tpd_oa
> relation |   529986 | 1842231489 | 2130531 | AccessShareLock     | t
>     |                               | st.tpd_oa
>
>
>   pid   |    query_start     |   state_change     | wait_event_type |
> wait_event |                       query
>
---------+--------------------+--------------------+-----------------+------------+-------------------------------------------------------------
> 2128603 | 00:32:32.924413+01 | 00:32:32.924413+01 | Lock            |
> relation   | DROP TABLE st.tpd_oa_18929
> 2130531 | 00:32:32.625706+01 | 00:32:32.625708+01 | Lock            |
> relation   | DELETE FROM st.ctr_table_efr_oa_19010 WHERE ropid =
> 44788868
>
> I hope this version is easier to read.
>
> Cheers
> Mike
> On Tue, 25 Jan 2022 at 15:49, Michael Harris <harmic@gmail.com> wrote:
It’s definitely tidier, but on my mac Mail.app and on Thunderbird there is still a line wrap problem.  I don’t think
thatwill prevent the experts from helping out.  I’m not one of them. 




Re: Undetected Deadlock

From
Alvaro Herrera
Date:
On 2022-Jan-25, Michael Harris wrote:

> We've recently updated our application to PG 14.1, and in the test instance we
> have started to see some alarming undetected deadlocks.

This is indeed suspicious / worrisome / curious.

What version were you using previously?

I reformatted the result sets:

> An example of what we have seen is:
> 
>  locktype | database |  relation  | page | tuple | virtualxid | transactionid | classid | objid | objsubid |
virtualtransaction|   pid   |      mode       | granted | fastpath |           waitstart           |
relation
>
----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+-----------------+---------+----------+-------------------------------+--------------------------------
>  relation |   529986 | 1842228045 |      |       |            |               |         |       |          |
165/1941408       | 2130531 | AccessShareLock | f       | f        | 2022-01-19 00:32:32.626152+01 |
st.ctr_table_efr_oa
> (1 row)
> 
>  locktype | database |  relation  | page | tuple | virtualxid | transactionid | classid | objid | objsubid |
virtualtransaction|   pid   |        mode         | granted | fastpath | waitstart | relation
 
>
----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+---------------------+---------+----------+-----------+--------------------------------
>  relation |   529986 | 1842228045 |      |       |            |               |         |       |          |
75/2193719        | 2128603 | AccessExclusiveLock | t       | f        |           | st.ctr_table_efr_oa
 
> (1 row)
> 
>  locktype | database |  relation  | page | tuple | virtualxid | transactionid | classid | objid | objsubid |
virtualtransaction|   pid   |        mode         | granted | fastpath |           waitstart           | relation
 
>
----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+---------------------+---------+----------+-------------------------------+-----------
>  relation |   529986 | 1842231489 |      |       |            |               |         |       |          |
75/2193719        | 2128603 | AccessExclusiveLock | f       | f        | 2022-01-19 00:32:32.924694+01 | st.tpd_oa
 
> (1 row)
> 
>    locktype    | database |  relation  | page | tuple |  virtualxid  | transactionid | classid |   objid   | objsubid
|virtualtransaction |   pid   |         mode          | granted | fastpath |           waitstart           | relation
 
>
---------------+----------+------------+------+-------+--------------+---------------+---------+-----------+----------+--------------------+---------+-----------------------+---------+----------+-------------------------------+-----------
>  relation      |   529986 | 1842231489 |      |       |              |               |         |           |
|165/1941408        | 2130531 | AccessShareLock       | t       | f        |                               | st.tpd_oa
 
> 
> So:
>   pid 2130531 waits for an AccessShareLock on relation 1842228045, blocked by pid 2128603 which holds an
AccessExclusiveLock
>   pid 2128603 waits for an AccessExclusiveLock on relation 1842231489, blocked by pid 2130531 which holds an
AccessShareLock
> 
> The queries being executed by these backends are:
> 
>    pid   |          query_start          |         state_change          | wait_event_type | wait_event | state  |
query
>
---------+-------------------------------+-------------------------------+-----------------+------------+--------+-------------------------------------------------------------------------
>  2128603 | 2022-01-19 00:32:32.924413+01 | 2022-01-19 00:32:32.924413+01 | Lock            | relation   | active |
DROPTABLE st.tpd_oa_18929
 
>  2130531 | 2022-01-19 00:32:32.625706+01 | 2022-01-19 00:32:32.625708+01 | Lock            | relation   | active |
DELETEFROM st.ctr_table_efr_oa_19010 WHERE ropid = 44788868
 
> (2 rows)

I know of no cases in which we fail to detect a deadlock.  Perhaps you
have indeed hit a bug.

> Note that there were a lot of other processes also waiting on relation
> 1842231489 - could that be confusing the deadlock detection routine?

It shouldn't.

> I am also confused about the locks which are being taken out by the
> DELETE query.

Maybe the lock is already taken before the DELETE is run; do you have
any triggers, rules, constraints, or anything?  If you have seen this
several times already, maybe a way to investigate deeper is an
exhaustive log capture of everything that these transactions do, from
the point they begin until they become blocked (log_statement=all).

Perhaps you need to involve other concurrent transactions in order to
cause the problem.

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Tiene valor aquel que admite que es un cobarde" (Fernandel)



Re: Undetected Deadlock

From
Michael Harris
Date:
Hi Alvaro

Thanks for the feedback!

> What version were you using previously?

We were previously on 11.4. Another difference is that we were using
inheritance based partitioning before, whereas now we are using
declarative partitioning.

> Maybe the lock is already taken before the DELETE is run; do you have
> any triggers, rules, constraints, or anything?

There are no triggers, rules or constraints on the table involved in
the DELETE (either the partition or the table that the partition is
part of).

Even more confusingly - when I reproduce the SQL statements that
should be triggering the deadlock, it does not happen: the DELETE does
not attempt to take an AccessShareLock on the parent table, so it does
not deadlock.

Is there any state associated with a transaction or a database
connection that would alter the lock(s) that gets taken out for a
DELETE on a partition? Or can other concurrent transactions in other
processes cause more locks to be needed somehow?

> If you have seen this several times already, maybe a way to investigate deeper is an
> exhaustive log capture of everything that these transactions do

So far it has happened at least twice. There were a couple of other
incidents that may well also have been caused by this, but not enough
data was collected at the time to be sure.

A bit more detail: the two processes that are deadlocked here are one
that is ingesting new data, while the other is removing old data by
dropping partitions. Even before we shifted to 14.1 and native
partitioning, we did get deadlocks between these two processes every
so often which we could not really prevent, so we adopted a retry
approach when it does occur. However we never had an undetected
deadlock in the database.

Since going to 14.1 & native partitioning, we are seeing a lot more
frequent deadlocks. It looks like the increase in frequency might be
related to the extra lock taken by the DELETE that I mentioned above.
However most of the time the deadlock is detected and released by
Postgres and the impact is minimal. Of course it is another story if
it is not detected!

I have enabled `log_statement=all`, but the undetected deadlock hasn't
happened again since. I can easily reproduce the deadlock itself, but
not the undetected case.

Thanks again.

Cheers
Mike

On Wed, 26 Jan 2022 at 10:11, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2022-Jan-25, Michael Harris wrote:
>
> > We've recently updated our application to PG 14.1, and in the test instance we
> > have started to see some alarming undetected deadlocks.
>
> This is indeed suspicious / worrisome / curious.
>
> What version were you using previously?
>
> I reformatted the result sets:
>
> > An example of what we have seen is:
> >
> >  locktype | database |  relation  | page | tuple | virtualxid | transactionid | classid | objid | objsubid |
virtualtransaction|   pid   |      mode       | granted | fastpath |           waitstart           |
relation
> >
----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+-----------------+---------+----------+-------------------------------+--------------------------------
> >  relation |   529986 | 1842228045 |      |       |            |               |         |       |          |
165/1941408       | 2130531 | AccessShareLock | f       | f        | 2022-01-19 00:32:32.626152+01 |
st.ctr_table_efr_oa
> > (1 row)
> >
> >  locktype | database |  relation  | page | tuple | virtualxid | transactionid | classid | objid | objsubid |
virtualtransaction|   pid   |        mode         | granted | fastpath | waitstart | relation 
> >
----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+---------------------+---------+----------+-----------+--------------------------------
> >  relation |   529986 | 1842228045 |      |       |            |               |         |       |          |
75/2193719        | 2128603 | AccessExclusiveLock | t       | f        |           | st.ctr_table_efr_oa 
> > (1 row)
> >
> >  locktype | database |  relation  | page | tuple | virtualxid | transactionid | classid | objid | objsubid |
virtualtransaction|   pid   |        mode         | granted | fastpath |           waitstart           | relation 
> >
----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+---------------------+---------+----------+-------------------------------+-----------
> >  relation |   529986 | 1842231489 |      |       |            |               |         |       |          |
75/2193719        | 2128603 | AccessExclusiveLock | f       | f        | 2022-01-19 00:32:32.924694+01 | st.tpd_oa 
> > (1 row)
> >
> >    locktype    | database |  relation  | page | tuple |  virtualxid  | transactionid | classid |   objid   |
objsubid| virtualtransaction |   pid   |         mode          | granted | fastpath |           waitstart           |
relation
> >
---------------+----------+------------+------+-------+--------------+---------------+---------+-----------+----------+--------------------+---------+-----------------------+---------+----------+-------------------------------+-----------
> >  relation      |   529986 | 1842231489 |      |       |              |               |         |           |
 | 165/1941408        | 2130531 | AccessShareLock       | t       | f        |                               |
st.tpd_oa
> >
> > So:
> >   pid 2130531 waits for an AccessShareLock on relation 1842228045, blocked by pid 2128603 which holds an
AccessExclusiveLock
> >   pid 2128603 waits for an AccessExclusiveLock on relation 1842231489, blocked by pid 2130531 which holds an
AccessShareLock
> >
> > The queries being executed by these backends are:
> >
> >    pid   |          query_start          |         state_change          | wait_event_type | wait_event | state  |
query
> >
---------+-------------------------------+-------------------------------+-----------------+------------+--------+-------------------------------------------------------------------------
> >  2128603 | 2022-01-19 00:32:32.924413+01 | 2022-01-19 00:32:32.924413+01 | Lock            | relation   | active |
DROPTABLE st.tpd_oa_18929 
> >  2130531 | 2022-01-19 00:32:32.625706+01 | 2022-01-19 00:32:32.625708+01 | Lock            | relation   | active |
DELETEFROM st.ctr_table_efr_oa_19010 WHERE ropid = 44788868 
> > (2 rows)
>
> I know of no cases in which we fail to detect a deadlock.  Perhaps you
> have indeed hit a bug.
>
> > Note that there were a lot of other processes also waiting on relation
> > 1842231489 - could that be confusing the deadlock detection routine?
>
> It shouldn't.
>
> > I am also confused about the locks which are being taken out by the
> > DELETE query.
>
> Maybe the lock is already taken before the DELETE is run; do you have
> any triggers, rules, constraints, or anything?  If you have seen this
> several times already, maybe a way to investigate deeper is an
> exhaustive log capture of everything that these transactions do, from
> the point they begin until they become blocked (log_statement=all).
>
> Perhaps you need to involve other concurrent transactions in order to
> cause the problem.
>
> --
> Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
> "Tiene valor aquel que admite que es un cobarde" (Fernandel)



Re: Undetected Deadlock

From
Michael Lewis
Date:
I must be missing something. You mentioned dropping a partition, so is there an actual need for the delete? Could you detach concurrently and then drop the table or delete rows if needed?

https://www.postgresql.org/docs/14/sql-altertable.html#SQL-ALTERTABLE-DETACH-PARTITION

Re: Undetected Deadlock

From
Michael Harris
Date:
> I must be missing something. You mentioned dropping a partition, so is there an actual need for the delete? Could you
detachconcurrently and then drop the table or delete rows if needed?
 

The DELETE is part of a transaction performing data loading. Our
application allows data to be overwritten. Before using a COPY to
actually store the new data, we do a DELETE to make sure there are no
existing records. We tested a number of sequences and found that the
most efficient is to perform a DELETE first, even though in most cases
there is nothing to delete, and then do the COPY (rather than for
example checking if it exists first and then deleting it).

The DROP is part of a separate transaction that is doing the data
expiry. Note that it is not dropping the same partition; old
partitions get dropped, new data is loaded into newer partitions.

Cheers
Mike



Re: Undetected Deadlock

From
Michael Lewis
Date:
There may be a bug so perhaps still pursue reproducing the issue, but I would expect drop concurrently to resolve your issue with the two processes conflicting. Also, perhaps trying"insert, on conflict do update" could be more efficient than the copy but obviously there are too many unknowns and variables for either to be a definite winner.

Re: Undetected Deadlock

From
Michael Harris
Date:
> but I would expect drop concurrently to resolve your issue with the two processes conflicting

I guess you mean ALTER TABLE DETACH PARTITION ... CONCURRENTLY?
DROP TABLE does not seem to have a concurrent option.
Still that does seem like a good option to try, thanks for drawing it
to my attention.

Cheers
Mike

On Thu, 27 Jan 2022 at 14:20, Michael Lewis <mlewis@entrata.com> wrote:
>
> There may be a bug so perhaps still pursue reproducing the issue, but I would expect drop concurrently to resolve
yourissue with the two processes conflicting. Also, perhaps trying"insert, on conflict do update" could be more
efficientthan the copy but obviously there are too many unknowns and variables for either to be a definite winner. 



Re: Undetected Deadlock

From
Michael Harris
Date:
Hi

The undetected deadlock occurred again today and I was able to collect
some more info.

The presentation was very similar to the case I reported previously:
- One backend trying to do a DROP TABLE on a partition of a
partitioned table, waiting for an AccessExclusiveLock on that table
- Another backend trying to do a DELETE on records within a partition
of another table, waiting on an AccessShareLock which is already held
by the first table and already holding a lock on the table that the
first backend is waiting for
- A load of other backends also trying to do a DELETE.

I was able to attach gdb to the backends and I discovered a few things.

First, taking a stack trace of the process doing the DELETE, it looks like this:

#0  0x00007fc82a6750bb in epoll_wait (epfd=13, events=0x18696f8,
maxevents=maxevents@entry=1, timeout=timeout@entry=-1) at
../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00000000007d0c32 in WaitEventSetWaitBlock (nevents=1,
occurred_events=<optimized out>, cur_timeout=-1, set=0x1869698) at
latch.c:1450
#2  WaitEventSetWait (set=0x1869698, timeout=<optimized out>,
occurred_events=<optimized out>, nevents=1, wait_event_info=<optimized
out>) at latch.c:1396
#3  0x00000000007d0f94 in WaitLatch (latch=<optimized out>,
wakeEvents=wakeEvents@entry=33, timeout=timeout@entry=0,
wait_event_info=50331648) at latch.c:473
#4  0x00000000007eaea7 in ProcSleep
(locallock=locallock@entry=0x24571d0,
lockMethodTable=lockMethodTable@entry=0xdb2360 <default_lockmethod>)
at proc.c:1361
#5  0x00000000007df419 in WaitOnLock
(locallock=locallock@entry=0x24571d0, owner=owner@entry=0x18b80f8) at
lock.c:1858
#6  0x00000000007e052c in LockAcquireExtended
(locktag=locktag@entry=0x7ffced1d0b80, lockmode=lockmode@entry=1,
sessionLock=sessionLock@entry=false, dontWait=dontWait@entry=false,
reportMemoryError=reportMemoryError@entry=true,
locallockp=locallockp@entry=0x7ffced1d0b78) at lock.c:1100
#7  0x00000000007ddb23 in LockRelationOid (relid=1842227607,
lockmode=1) at lmgr.c:117
#8  0x000000000050dba5 in relation_open (relationId=1842227607,
lockmode=lockmode@entry=1) at relation.c:56
#9  0x00000000008fc838 in generate_partition_qual (rel=0x56614e8) at
partcache.c:361
#10 generate_partition_qual (rel=0x56614e8) at partcache.c:336
#11 0x00000000007502bf in set_baserel_partition_constraint
(relation=relation@entry=0x56614e8, rel=<optimized out>,
rel=<optimized out>) at plancat.c:2402
~snip~

The lock is being taken out by the function generate_partition_qual
(in partcache.c) which has this comment:

    /*
     * Grab at least an AccessShareLock on the parent table.  Must do this
     * even if the partition has been partially detached, because transactions
     * concurrent with the detach might still be trying to use a partition
     * descriptor that includes it.
     */

This is happening during query planning rather than execution, where
it is trying to fetch the partition bound expression.
It explains why this lock is not always required (the result is cached
in the relcache, and anyway it looks to me like this code path is
called depending on factors such as the row estimates).

The second thing was that the DROP process, and all the DELETE
processes, were all waiting inside epoll_wait ultimately called from
ProcSleep as shown above. It is sitting in this section of code
(proc.c line 1361):

            (void) WaitLatch(MyLatch, WL_LATCH_SET |
WL_EXIT_ON_PM_DEATH, 0,                       <--- Processes sitting
here
                             PG_WAIT_LOCK | locallock->tag.lock.locktag_type);
            ResetLatch(MyLatch);
            /* check for deadlocks first, as that's probably log-worthy */
            if (got_deadlock_timeout)

<-- set breakpoint here
            {
                CheckDeadLock();
                got_deadlock_timeout = false;
            }

I set a breakpoint on the next line after ResetLatch is called, and it
was never reached. I thought that could be due to GDB attaching to the
process and disrupting the alarm signal that should be sent into the
timeout handler, so I manually sent it a SIGALRM.
At that point the breakpoint was triggered - but the flag
got_deadlock_timeout is not set, so the deadlock check is not
executed.
This led me to believe that the deadlock check timeout wasn't set
properly at all.

To see if the deadlock would be detected if CheckDeadlock() got an
opportunity to run, I manually set got_deadlock_timeout, and sure
enough the deadlock was detected and released.

I concluded that the deadlock detection function is able to detect
these deadlocks, but for some reason the 1sec timeout is not being
correctly scheduled so it never runs.
When it is artificially triggered, it does release the deadlock.

After the release, a new deadlock formed between the DROP process and
another one of the waiting DELETE processes (not an entirely
unexpected outcome).
I will leave it in that state for as long as possible in case anyone
can think of any other info that should be gathered. Luckily it is a
test instance.

Sorry for the slightly long and detailed email - let me know if this
should be moved to another forum.

Cheers
Mike



Re: Undetected Deadlock

From
Michael Harris
Date:
Hi again

Some good news. After some more debugging & reflection, I realized
that the likely cause is one of our own libraries that gets loaded as
part of some custom functions we are using.

Some of these functions trigger fetching of remote resources, for
which a timeout is set using `alarm`. The function unfortunately does
not re-establish any pre-existing interval timers after it is done,
which leads to postgresql missing it's own expected alarm signal.

The reason that this was not affecting us on previous postgres
versions was this commit:


https://github.com/postgres/postgres/commit/09cf1d52267644cdbdb734294012cf1228745aaa#diff-b12a7ca3bf9c6a56745844c2670b0b28d2a4237741c395dda318c6cc3664ad4a

After this commit, once an alarm is missed, that backend never sets
one again, so no timeouts of any kind will work. Therefore, the
deadlock detector was never being run. Prior to that, the next time
any timeout was set by the backend it would re-establish it's timer.

We will of course fix our own code to prevent this issue, but I am a
little concerned at the above commit as it reduces the robustness of
postgres in this situation. Perhaps I will raise it on the
pgsql-hackers list.

Cheers
Mike

On Tue, 1 Feb 2022 at 17:50, Michael Harris <harmic@gmail.com> wrote:
>
> Hi
>
> The undetected deadlock occurred again today and I was able to collect
> some more info.
>
> The presentation was very similar to the case I reported previously:
> - One backend trying to do a DROP TABLE on a partition of a
> partitioned table, waiting for an AccessExclusiveLock on that table
> - Another backend trying to do a DELETE on records within a partition
> of another table, waiting on an AccessShareLock which is already held
> by the first table and already holding a lock on the table that the
> first backend is waiting for
> - A load of other backends also trying to do a DELETE.
>
> I was able to attach gdb to the backends and I discovered a few things.
>
> First, taking a stack trace of the process doing the DELETE, it looks like this:
>
> #0  0x00007fc82a6750bb in epoll_wait (epfd=13, events=0x18696f8,
> maxevents=maxevents@entry=1, timeout=timeout@entry=-1) at
> ../sysdeps/unix/sysv/linux/epoll_wait.c:30
> #1  0x00000000007d0c32 in WaitEventSetWaitBlock (nevents=1,
> occurred_events=<optimized out>, cur_timeout=-1, set=0x1869698) at
> latch.c:1450
> #2  WaitEventSetWait (set=0x1869698, timeout=<optimized out>,
> occurred_events=<optimized out>, nevents=1, wait_event_info=<optimized
> out>) at latch.c:1396
> #3  0x00000000007d0f94 in WaitLatch (latch=<optimized out>,
> wakeEvents=wakeEvents@entry=33, timeout=timeout@entry=0,
> wait_event_info=50331648) at latch.c:473
> #4  0x00000000007eaea7 in ProcSleep
> (locallock=locallock@entry=0x24571d0,
> lockMethodTable=lockMethodTable@entry=0xdb2360 <default_lockmethod>)
> at proc.c:1361
> #5  0x00000000007df419 in WaitOnLock
> (locallock=locallock@entry=0x24571d0, owner=owner@entry=0x18b80f8) at
> lock.c:1858
> #6  0x00000000007e052c in LockAcquireExtended
> (locktag=locktag@entry=0x7ffced1d0b80, lockmode=lockmode@entry=1,
> sessionLock=sessionLock@entry=false, dontWait=dontWait@entry=false,
> reportMemoryError=reportMemoryError@entry=true,
> locallockp=locallockp@entry=0x7ffced1d0b78) at lock.c:1100
> #7  0x00000000007ddb23 in LockRelationOid (relid=1842227607,
> lockmode=1) at lmgr.c:117
> #8  0x000000000050dba5 in relation_open (relationId=1842227607,
> lockmode=lockmode@entry=1) at relation.c:56
> #9  0x00000000008fc838 in generate_partition_qual (rel=0x56614e8) at
> partcache.c:361
> #10 generate_partition_qual (rel=0x56614e8) at partcache.c:336
> #11 0x00000000007502bf in set_baserel_partition_constraint
> (relation=relation@entry=0x56614e8, rel=<optimized out>,
> rel=<optimized out>) at plancat.c:2402
> ~snip~
>
> The lock is being taken out by the function generate_partition_qual
> (in partcache.c) which has this comment:
>
>     /*
>      * Grab at least an AccessShareLock on the parent table.  Must do this
>      * even if the partition has been partially detached, because transactions
>      * concurrent with the detach might still be trying to use a partition
>      * descriptor that includes it.
>      */
>
> This is happening during query planning rather than execution, where
> it is trying to fetch the partition bound expression.
> It explains why this lock is not always required (the result is cached
> in the relcache, and anyway it looks to me like this code path is
> called depending on factors such as the row estimates).
>
> The second thing was that the DROP process, and all the DELETE
> processes, were all waiting inside epoll_wait ultimately called from
> ProcSleep as shown above. It is sitting in this section of code
> (proc.c line 1361):
>
>             (void) WaitLatch(MyLatch, WL_LATCH_SET |
> WL_EXIT_ON_PM_DEATH, 0,                       <--- Processes sitting
> here
>                              PG_WAIT_LOCK | locallock->tag.lock.locktag_type);
>             ResetLatch(MyLatch);
>             /* check for deadlocks first, as that's probably log-worthy */
>             if (got_deadlock_timeout)
>
> <-- set breakpoint here
>             {
>                 CheckDeadLock();
>                 got_deadlock_timeout = false;
>             }
>
> I set a breakpoint on the next line after ResetLatch is called, and it
> was never reached. I thought that could be due to GDB attaching to the
> process and disrupting the alarm signal that should be sent into the
> timeout handler, so I manually sent it a SIGALRM.
> At that point the breakpoint was triggered - but the flag
> got_deadlock_timeout is not set, so the deadlock check is not
> executed.
> This led me to believe that the deadlock check timeout wasn't set
> properly at all.
>
> To see if the deadlock would be detected if CheckDeadlock() got an
> opportunity to run, I manually set got_deadlock_timeout, and sure
> enough the deadlock was detected and released.
>
> I concluded that the deadlock detection function is able to detect
> these deadlocks, but for some reason the 1sec timeout is not being
> correctly scheduled so it never runs.
> When it is artificially triggered, it does release the deadlock.
>
> After the release, a new deadlock formed between the DROP process and
> another one of the waiting DELETE processes (not an entirely
> unexpected outcome).
> I will leave it in that state for as long as possible in case anyone
> can think of any other info that should be gathered. Luckily it is a
> test instance.
>
> Sorry for the slightly long and detailed email - let me know if this
> should be moved to another forum.
>
> Cheers
> Mike



Re: Undetected Deadlock

From
Simon Riggs
Date:
On Thu, 3 Feb 2022 at 06:25, Michael Harris <harmic@gmail.com> wrote:
>
> Hi again
>
> Some good news. After some more debugging & reflection, I realized
> that the likely cause is one of our own libraries that gets loaded as
> part of some custom functions we are using.
>
> Some of these functions trigger fetching of remote resources, for
> which a timeout is set using `alarm`. The function unfortunately does
> not re-establish any pre-existing interval timers after it is done,
> which leads to postgresql missing it's own expected alarm signal.
>
> The reason that this was not affecting us on previous postgres
> versions was this commit:
>
>
https://github.com/postgres/postgres/commit/09cf1d52267644cdbdb734294012cf1228745aaa#diff-b12a7ca3bf9c6a56745844c2670b0b28d2a4237741c395dda318c6cc3664ad4a
>
> After this commit, once an alarm is missed, that backend never sets
> one again, so no timeouts of any kind will work. Therefore, the
> deadlock detector was never being run. Prior to that, the next time
> any timeout was set by the backend it would re-establish it's timer.
>
> We will of course fix our own code to prevent this issue, but I am a
> little concerned at the above commit as it reduces the robustness of
> postgres in this situation. Perhaps I will raise it on the
> pgsql-hackers list.

Hmm, so you turned off Postgres' alarms so they stopped working, and
you're saying that is a robustness issue of Postgres?

Yes, something broke and it would be nice to avoid that, but the
responsibility for that lies in the user code that was called.
Postgres can't know what kernel calls are made during a custom
function.

Perhaps you could contribute a test case for this situation and a new
call to check/reset any missing alarms? Or alternatively, contribute
the function library that fetches remote resources, so that can become
an optional part of Postgres, in contrib.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Undetected Deadlock

From
Tom Lane
Date:
Simon Riggs <simon.riggs@enterprisedb.com> writes:
> On Thu, 3 Feb 2022 at 06:25, Michael Harris <harmic@gmail.com> wrote:
>> Some of these functions trigger fetching of remote resources, for
>> which a timeout is set using `alarm`. The function unfortunately does
>> not re-establish any pre-existing interval timers after it is done,
>> which leads to postgresql missing it's own expected alarm signal.
>>
>> The reason that this was not affecting us on previous postgres
>> versions was this commit:
>>
>>
https://github.com/postgres/postgres/commit/09cf1d52267644cdbdb734294012cf1228745aaa#diff-b12a7ca3bf9c6a56745844c2670b0b28d2a4237741c395dda318c6cc3664ad4a
>>
>> After this commit, once an alarm is missed, that backend never sets
>> one again, so no timeouts of any kind will work. Therefore, the
>> deadlock detector was never being run. Prior to that, the next time
>> any timeout was set by the backend it would re-establish it's timer.
>>
>> We will of course fix our own code to prevent this issue, but I am a
>> little concerned at the above commit as it reduces the robustness of
>> postgres in this situation. Perhaps I will raise it on the
>> pgsql-hackers list.

> Hmm, so you turned off Postgres' alarms so they stopped working, and
> you're saying that is a robustness issue of Postgres?

If Michael's analysis were accurate, I'd agree that there is a robustness
issue, but I don't think there is.  See timeout.c:220:

        /*
         * Get the time remaining till the nearest pending timeout.  If it is
         * negative, assume that we somehow missed an interrupt, and force
         * signal_pending off.  This gives us a chance to recover if the
         * kernel drops a timeout request for some reason.
         */
        nearest_timeout = active_timeouts[0]->fin_time;
        if (now > nearest_timeout)
        {
            signal_pending = false;
            /* force an interrupt as soon as possible */
            secs = 0;
            usecs = 1;
        }

Now admittedly we don't have a good way to test this stanza, but
it should result in re-establishing the timer interrupt the next
time any timeout.c API is invoked after a missed interrupt.
I don't see anything more that we could or should do.  We're
not going to issue setitimer() after every user-defined function
call.

            regards, tom lane



Re: Undetected Deadlock

From
Tom Lane
Date:
Michael Harris <harmic@gmail.com> writes:
>> If Michael's analysis were accurate, I'd agree that there is a robustness
>> issue, but I don't think there is.  See timeout.c:220:

> Actually that only sets a new timer after the nearest timeout has expired.

Mmm, yeah, you're right: as long as we keep on canceling timeout
requests before they are reached, this code won't notice a problem.
We need to check against signal_due_at, not only the next timeout
request, more or less as attached.  Do you want to try this and see
if it actually adds any robustness with your buggy code?

> I was thinking that to improve robustness, we could add a check for
> `now < signal_due_at` to schedule_alarm line 300:

>         if (signal_pending && now < signal_due_at && nearest_timeout >= signal_due_at)
>             return;

I don't think that merging this issue into that test is appropriate;
the logic is already complicated and hard to explain.  Also, it seems
to me that some slop is needed, since as mentioned nearby, the kernel's
opinion of when to fire the interrupt is likely to be a bit later
than ours.  I'm not wedded to the 10ms slop proposed below, but it
seems like it's probably in the right ballpark.

> One other thing that struck me when reading this code: the variable
> signal_due_at is not declared as volatile sig_atomic_t, even though it
> is read from / written to by the signal handler. Maybe that could
> cause problems?

Hm.  I'm not really convinced there's any issue, since signal_due_at
is only examined when signal_pending is true.  Still, it's probably
better for all these variables to be volatile, so done.

> One other question: I've fixed our custom function, so that it
> correctly restores any interval timers that were running, but of
> course if our function takes longer than the remaining time on the
> postgres timer the signal will be delivered late. Beyond the fact that
> a deadlock or statement timeout will take longer than expected, are
> there any other negative consequences? Are any of the timeouts
> time-critical such that being delayed by a few seconds would cause a
> problem?

They're certainly not supposed to be that time-critical; any code that
is expecting such a thing is going to have lots of problems already.
PG is not a hard-real-time system ;-)

            regards, tom lane

diff --git a/src/backend/utils/misc/timeout.c b/src/backend/utils/misc/timeout.c
index ac57073033..f38c85df74 100644
--- a/src/backend/utils/misc/timeout.c
+++ b/src/backend/utils/misc/timeout.c
@@ -71,11 +71,12 @@ static volatile sig_atomic_t alarm_enabled = false;

 /*
  * State recording if and when we next expect the interrupt to fire.
+ * (signal_due_at is valid only when signal_pending is true.)
  * Note that the signal handler will unconditionally reset signal_pending to
  * false, so that can change asynchronously even when alarm_enabled is false.
  */
 static volatile sig_atomic_t signal_pending = false;
-static TimestampTz signal_due_at = 0;    /* valid only when signal_pending */
+static volatile TimestampTz signal_due_at = 0;


 /*****************************************************************************
@@ -217,10 +218,21 @@ schedule_alarm(TimestampTz now)

         MemSet(&timeval, 0, sizeof(struct itimerval));

+        /*
+         * If we think there's a signal pending, but current time is more than
+         * 10ms past when the signal was due, then assume that the timeout
+         * request got lost somehow and re-enable it.  (10ms corresponds to
+         * the worst-case timeout granularity on modern systems.)  It won't
+         * hurt us if the interrupt does manage to fire between now and when
+         * we reach the setitimer() call.
+         */
+        if (signal_pending && now > signal_due_at + 10 * 1000)
+            signal_pending = false;
+
         /*
          * Get the time remaining till the nearest pending timeout.  If it is
          * negative, assume that we somehow missed an interrupt, and force
-         * signal_pending off.  This gives us a chance to recover if the
+         * signal_pending off.  This gives us another chance to recover if the
          * kernel drops a timeout request for some reason.
          */
         nearest_timeout = active_timeouts[0]->fin_time;

Re: Undetected Deadlock

From
Tom Lane
Date:
Michael Harris <harmic@gmail.com> writes:
> On Mon, 7 Feb 2022 at 09:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Do you want to try this and see if it actually adds any robustness with your buggy code?

> Sorry for the delayed response, & thanks for the patch.

> I wasn't able to test with our actual application because it could
> take days for it to actually trigger the problem, so I tested it with
> a simulation, which you can find here:

> https://github.com/harmic/pg_almloss

> With that simulation I could attach gdb to the backend and see that
> signal_pending & signal_due_at were being reset in the expected way,
> even when a missed interrupt was triggered.

> I'm convinced your patch improves robustness under the scenario we saw.

Great, thanks for testing!

            regards, tom lane