Thread: Tracking down a deadlock

Tracking down a deadlock

From
Bill Moseley
Date:
I need a bit of help understanding what might be causing a deadlock.

To duplicate the problem I'm running a test script that forks two
child processes.  Each child runs the same transaction and thus the
order of execution is exactly the same.  (i.e. not like the typical
deadlock where the order of updates might be reversed between two
sessions.)

The transaction inserts a new document into a document management
system.  The transaction does a number of selects and inserts.  At the
end of the transaction they both try and update the same row in the
"account" table.


It does not happen every time I run my test script -- but if I run it
enough I get a deadlock.  If I fork more child process I can make it
happen more often.  So, it does seem like a timing issue.


No explicit LOCK or SELECT FOR UPDATE is used in the transaction.
I'm running in the default "read committed" isolation level.
The initial problem was reported on PostgreSQL 8.3.5, but
I'm now testing on PostgreSQL 8.2.9.



I've set my deadlock_timeout high so I can review the locks.
I see these entires:


select * from pg_locks where not granted;
   locktype    | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction |  pid
|    mode      | granted  

---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+-------+---------------+---------
 transactionid |          |          |      |       |      18410123 |         |       |          |    18410135 | 13420
|ShareLock     | f 
 tuple         |  2474484 |  2474485 |   30 |    11 |               |         |       |          |    18410123 | 13419
|ExclusiveLock | f 
(2 rows)

select * from pg_locks where locktype='tuple';
 locktype | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction |  pid  |
mode     | granted  

----------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+-------+---------------+---------
 tuple    |  2474484 |  2474485 |   30 |    11 |               |         |       |          |    18410135 | 13420 |
ExclusiveLock| t 
 tuple    |  2474484 |  2474485 |   30 |    11 |               |         |       |          |    18410123 | 13419 |
ExclusiveLock| f 
(2 rows)


And pg_stat_activity shows two of the exact same queries in "waiting"
state.  The "current_query" is just:

    UPDATE account set foo = 123 where id = $1

and $1 is indeed the same for both.


If I comment out that update to the "account" table from the
transaction I never get a deadlock.



Maybe I'm missing something, but that by itself doesn't seem like a
deadlock situation.

The "account" table does have a number of constraints, and one looks
like:

    CHECK( ( foo + bar ) <= 0 );

Could those be responsible?  For a test I dropped all the constraints
(except foreign keys) and I'm still getting a deadlock.

In general, do the constraints need to be deferrable and then defer
constraints at the start of the transaction?

What else can I do to debug?


Thanks,


--
Bill Moseley
moseley@hank.org
Sent from my iMutt


Re: Tracking down a deadlock

From
Bill Moseley
Date:
Not getting any nibbles, so allow me to try a short question:

If I have a deadlock situation (that will be reported as such by
Postgresql once the deadlock_timeout passes), does pg_stat_activity
show the queries that are blocking each other?

I'm wondering if I'm misinterpreting what I'm seeing below.




On Thu, Apr 30, 2009 at 10:30:26AM -0700, Bill Moseley wrote:
>
> I need a bit of help understanding what might be causing a deadlock.
>
> To duplicate the problem I'm running a test script that forks two
> child processes.  Each child runs the same transaction and thus the
> order of execution is exactly the same.  (i.e. not like the typical
> deadlock where the order of updates might be reversed between two
> sessions.)
>
> The transaction inserts a new document into a document management
> system.  The transaction does a number of selects and inserts.  At the
> end of the transaction they both try and update the same row in the
> "account" table.
>
>
> It does not happen every time I run my test script -- but if I run it
> enough I get a deadlock.  If I fork more child process I can make it
> happen more often.  So, it does seem like a timing issue.
>
>
> No explicit LOCK or SELECT FOR UPDATE is used in the transaction.
> I'm running in the default "read committed" isolation level.
> The initial problem was reported on PostgreSQL 8.3.5, but
> I'm now testing on PostgreSQL 8.2.9.
>
>
>
> I've set my deadlock_timeout high so I can review the locks.
> I see these entires:
>
>
> select * from pg_locks where not granted;
>    locktype    | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction |  pid
|     mode      | granted  
>
---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+-------+---------------+---------
>  transactionid |          |          |      |       |      18410123 |         |       |          |    18410135 |
13420| ShareLock     | f 
>  tuple         |  2474484 |  2474485 |   30 |    11 |               |         |       |          |    18410123 |
13419| ExclusiveLock | f 
> (2 rows)
>
> select * from pg_locks where locktype='tuple';
>  locktype | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction |  pid  |
 mode      | granted  
>
----------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+-------+---------------+---------
>  tuple    |  2474484 |  2474485 |   30 |    11 |               |         |       |          |    18410135 | 13420 |
ExclusiveLock| t 
>  tuple    |  2474484 |  2474485 |   30 |    11 |               |         |       |          |    18410123 | 13419 |
ExclusiveLock| f 
> (2 rows)
>
>
> And pg_stat_activity shows two of the exact same queries in "waiting"
> state.  The "current_query" is just:
>
>     UPDATE account set foo = 123 where id = $1
>
> and $1 is indeed the same for both.
>
>
> If I comment out that update to the "account" table from the
> transaction I never get a deadlock.
>
>
>
> Maybe I'm missing something, but that by itself doesn't seem like a
> deadlock situation.
>
> The "account" table does have a number of constraints, and one looks
> like:
>
>     CHECK( ( foo + bar ) <= 0 );
>
> Could those be responsible?  For a test I dropped all the constraints
> (except foreign keys) and I'm still getting a deadlock.
>
> In general, do the constraints need to be deferrable and then defer
> constraints at the start of the transaction?
>
> What else can I do to debug?
>

--
Bill Moseley.
moseley@hank.org
Sent from my iMutt

Re: Tracking down a deadlock

From
Tom Lane
Date:
Bill Moseley <moseley@hank.org> writes:
> Not getting any nibbles, so allow me to try a short question:
> If I have a deadlock situation (that will be reported as such by
> Postgresql once the deadlock_timeout passes), does pg_stat_activity
> show the queries that are blocking each other?

In 8.2 or later it should do so; in prior versions there could be some
lag involved.  Another thing to keep in mind is that backends like to
cache copies of the pg_stat_activity view --- if you are watching it
to see what is happening, your view is only current as of the start
of your current transaction.  Or you can do pgstat_clear_snapshot()
to force collection of new info.

            regards, tom lane

Re: Tracking down a deadlock

From
Bill Moseley
Date:
On Sat, May 02, 2009 at 11:48:21AM -0400, Tom Lane wrote:
> Bill Moseley <moseley@hank.org> writes:
> > Not getting any nibbles, so allow me to try a short question:
> > If I have a deadlock situation (that will be reported as such by
> > Postgresql once the deadlock_timeout passes), does pg_stat_activity
> > show the queries that are blocking each other?
>
> In 8.2 or later it should do so; in prior versions there could be some
> lag involved.  Another thing to keep in mind is that backends like to
> cache copies of the pg_stat_activity view --- if you are watching it
> to see what is happening, your view is only current as of the start
> of your current transaction.  Or you can do pgstat_clear_snapshot()
> to force collection of new info.

Thanks Tom,

I'm not clear how to run pgstat_clear_snapshot(), but I looked again and before
I ran my test script and pg_stat_activity doesn't list any queries waiting.
So, I don't believe it's showing stale data.

Then when I run the test script (which runs the same transaction in two
processes at the same time) and get a deadlock the same query is shown twice
both with "waiting" set true:

        UPDATE account set foo = 123 where id = $1

And if I remove that update from the transaction I no longer have the deadlock.
So, it seems like that is the problem update.

Is postgresql telling me that it's deadlocked on two transactions trying to run
that same update?

There are no other updates to that account table in the transaction, so I'm
confused how that is causing a deadlock.

Is there something else I can do to understand what exactly is the reason for
the deadlock?

Thanks,

--
Bill Moseley.
moseley@hank.org
Sent from my iMutt

Re: Tracking down a deadlock

From
Tom Lane
Date:
Bill Moseley <moseley@hank.org> writes:
> Then when I run the test script (which runs the same transaction in two
> processes at the same time) and get a deadlock the same query is shown twice
> both with "waiting" set true:

>         UPDATE account set foo = 123 where id = $1

> And if I remove that update from the transaction I no longer have the deadlock.
> So, it seems like that is the problem update.

> Is postgresql telling me that it's deadlocked on two transactions trying to run
> that same update?

Looks like it.

> There are no other updates to that account table in the transaction, so I'm
> confused how that is causing a deadlock.

Is there more than one row with the target id?  Does the account table
have any foreign-key references to or from it?  It's sometimes possible
to get a deadlock associated with trying to lock FK-referenced rows that
several updated rows have in common.

            regards, tom lane

Re: Tracking down a deadlock

From
Bill Moseley
Date:
On Sun, May 03, 2009 at 09:08:29PM -0400, Tom Lane wrote:
> > There are no other updates to that account table in the transaction, so I'm
> > confused how that is causing a deadlock.
>
> Is there more than one row with the target id?

No.  It's a single SERIAL primary key.


> Does the account table have any foreign-key references to or from it?

Many.  I had dropped all constraints, except foreign keys, from the account
table to see if that would help.  (Didn't).  One CHECK constraint involved
checking the sum of two columns in the account table which seemed like a
potential place for a deadlock.  But I didn't think the foreign keys would be a
problem.

About 8 tables reference the account table, and the account table has about 5
columns that reference other tables.


> It's sometimes possible to get a deadlock associated with trying to lock
> FK-referenced rows that several updated rows have in common.

The transaction has a number of selects and inserts not related to the
account table.  There is an insert into a log table that references the account
table, though, that happens right before the update to the account table.
I can't picture the deadlock, but I'm only familiar with the obvious examples.

What's the approach for dealing with this kind of deadlock (assuming the
FK-related as you suggest)?  I assume at this point I need to try explicit
locking earlier in the transaction.  Any suggestions which lock to use and on
what?  SHARE ROW EXCLUSIVE on the account table before issuing the update?




--
Bill Moseley.
moseley@hank.org
Sent from my iMutt

Re: Tracking down a deadlock

From
Bill Moran
Date:
In response to Bill Moseley <moseley@hank.org>:

> On Sun, May 03, 2009 at 09:08:29PM -0400, Tom Lane wrote:
> > > There are no other updates to that account table in the transaction, so I'm
> > > confused how that is causing a deadlock.
> >
> > Is there more than one row with the target id?
>
> No.  It's a single SERIAL primary key.

I've never gone to the trouble to isolate this to a reproducible test
case, but I've seen situations where a single UPDATE statement appears
to deadlock when run simultaneously.

For example:
UPDATE session_table SET text_field = '==some huge value=='
 WHERE non_unique_column = 5;

Assuming this table sees frequent updates and that text_field is
normally very large, if there are a sufficient number of rows where
non_unique_column is 5, running this query in two independent sessions
has a high chance of deadlocking.

My theory is that since there is no ordering to the results returned
by the WHERE clause, the two queries may acquire row locks in
different orders.  The result being that they may actually deadlock
at the row level.  I don't know 100% if this is possible, but I do
know that working off that assumption, we reorganized our data
structure so that we could use a unique column in the WHERE clause,
and we have not seen the problem since.

In any event, I hope that information is helpful to you.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: Tracking down a deadlock

From
Bill Moran
Date:
In response to Bill Moran <wmoran@potentialtech.com>:

> In response to Bill Moseley <moseley@hank.org>:
>
> > On Sun, May 03, 2009 at 09:08:29PM -0400, Tom Lane wrote:
> > > > There are no other updates to that account table in the transaction, so I'm
> > > > confused how that is causing a deadlock.
> > >
> > > Is there more than one row with the target id?
> >
> > No.  It's a single SERIAL primary key.
>
> I've never gone to the trouble to isolate this to a reproducible test
...
> In any event, I hope that information is helpful to you.

And on second read ... that has absolutely nothing to do with your
scenario ...

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: Tracking down a deadlock

From
Bill Moseley
Date:
I seemed to have resolved this issue by a:

    LOCK TABLE account IN EXCLUSIVE MODE;

at the very start of the transaction to prevent that transaction
from running in parallel.

The transaction has quite a few selects and inserts and a few updates.
I was hoping to not lock at the start of the transaction which will
effectively serialize that code.

The update that was causing the deadlock was just about the last
command in the transaction.  Removing that update and the deadlocks
go away.

I had hoped that a LOCK near the end of the transaction (before that UPDATE
that deadlocks) would work.  Oddly, it didn't and the deadlock was reported
then on the LOCK itself, plus my deadlock_timeout (60 seconds) didn't seem to
apply in that case.   A mystery.

Thanks for the help,


--
Bill Moseley.
moseley@hank.org
Sent from my iMutt

Re: Tracking down a deadlock

From
Alvaro Herrera
Date:
Bill Moseley wrote:
> I seemed to have resolved this issue by a:
>
>     LOCK TABLE account IN EXCLUSIVE MODE;
>
> at the very start of the transaction to prevent that transaction
> from running in parallel.
>
> The transaction has quite a few selects and inserts and a few updates.
> I was hoping to not lock at the start of the transaction which will
> effectively serialize that code.
>
> The update that was causing the deadlock was just about the last
> command in the transaction.  Removing that update and the deadlocks
> go away.

I think you should be able to do a SELECT FOR UPDATE with the same WHERE
as the UPDATE as the first thing in your transaction.  That way it is
much less likely to deadlock with itself.  (This assumes that the set of
tuples to update doesn't change, which holds true everytime if your
transaction has isolation level serializable).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.