Thread: Tracking down a deadlock
I need a bit of help understanding what might be causing a deadlock. To duplicate the problem I'm running a test script that forks two child processes. Each child runs the same transaction and thus the order of execution is exactly the same. (i.e. not like the typical deadlock where the order of updates might be reversed between two sessions.) The transaction inserts a new document into a document management system. The transaction does a number of selects and inserts. At the end of the transaction they both try and update the same row in the "account" table. It does not happen every time I run my test script -- but if I run it enough I get a deadlock. If I fork more child process I can make it happen more often. So, it does seem like a timing issue. No explicit LOCK or SELECT FOR UPDATE is used in the transaction. I'm running in the default "read committed" isolation level. The initial problem was reported on PostgreSQL 8.3.5, but I'm now testing on PostgreSQL 8.2.9. I've set my deadlock_timeout high so I can review the locks. I see these entires: select * from pg_locks where not granted; locktype | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid | mode | granted ---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+-------+---------------+--------- transactionid | | | | | 18410123 | | | | 18410135 | 13420 |ShareLock | f tuple | 2474484 | 2474485 | 30 | 11 | | | | | 18410123 | 13419 |ExclusiveLock | f (2 rows) select * from pg_locks where locktype='tuple'; locktype | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid | mode | granted ----------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+-------+---------------+--------- tuple | 2474484 | 2474485 | 30 | 11 | | | | | 18410135 | 13420 | ExclusiveLock| t tuple | 2474484 | 2474485 | 30 | 11 | | | | | 18410123 | 13419 | ExclusiveLock| f (2 rows) And pg_stat_activity shows two of the exact same queries in "waiting" state. The "current_query" is just: UPDATE account set foo = 123 where id = $1 and $1 is indeed the same for both. If I comment out that update to the "account" table from the transaction I never get a deadlock. Maybe I'm missing something, but that by itself doesn't seem like a deadlock situation. The "account" table does have a number of constraints, and one looks like: CHECK( ( foo + bar ) <= 0 ); Could those be responsible? For a test I dropped all the constraints (except foreign keys) and I'm still getting a deadlock. In general, do the constraints need to be deferrable and then defer constraints at the start of the transaction? What else can I do to debug? Thanks, -- Bill Moseley moseley@hank.org Sent from my iMutt
Not getting any nibbles, so allow me to try a short question: If I have a deadlock situation (that will be reported as such by Postgresql once the deadlock_timeout passes), does pg_stat_activity show the queries that are blocking each other? I'm wondering if I'm misinterpreting what I'm seeing below. On Thu, Apr 30, 2009 at 10:30:26AM -0700, Bill Moseley wrote: > > I need a bit of help understanding what might be causing a deadlock. > > To duplicate the problem I'm running a test script that forks two > child processes. Each child runs the same transaction and thus the > order of execution is exactly the same. (i.e. not like the typical > deadlock where the order of updates might be reversed between two > sessions.) > > The transaction inserts a new document into a document management > system. The transaction does a number of selects and inserts. At the > end of the transaction they both try and update the same row in the > "account" table. > > > It does not happen every time I run my test script -- but if I run it > enough I get a deadlock. If I fork more child process I can make it > happen more often. So, it does seem like a timing issue. > > > No explicit LOCK or SELECT FOR UPDATE is used in the transaction. > I'm running in the default "read committed" isolation level. > The initial problem was reported on PostgreSQL 8.3.5, but > I'm now testing on PostgreSQL 8.2.9. > > > > I've set my deadlock_timeout high so I can review the locks. > I see these entires: > > > select * from pg_locks where not granted; > locktype | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid | mode | granted > ---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+-------+---------------+--------- > transactionid | | | | | 18410123 | | | | 18410135 | 13420| ShareLock | f > tuple | 2474484 | 2474485 | 30 | 11 | | | | | 18410123 | 13419| ExclusiveLock | f > (2 rows) > > select * from pg_locks where locktype='tuple'; > locktype | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid | mode | granted > ----------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+-------+---------------+--------- > tuple | 2474484 | 2474485 | 30 | 11 | | | | | 18410135 | 13420 | ExclusiveLock| t > tuple | 2474484 | 2474485 | 30 | 11 | | | | | 18410123 | 13419 | ExclusiveLock| f > (2 rows) > > > And pg_stat_activity shows two of the exact same queries in "waiting" > state. The "current_query" is just: > > UPDATE account set foo = 123 where id = $1 > > and $1 is indeed the same for both. > > > If I comment out that update to the "account" table from the > transaction I never get a deadlock. > > > > Maybe I'm missing something, but that by itself doesn't seem like a > deadlock situation. > > The "account" table does have a number of constraints, and one looks > like: > > CHECK( ( foo + bar ) <= 0 ); > > Could those be responsible? For a test I dropped all the constraints > (except foreign keys) and I'm still getting a deadlock. > > In general, do the constraints need to be deferrable and then defer > constraints at the start of the transaction? > > What else can I do to debug? > -- Bill Moseley. moseley@hank.org Sent from my iMutt
Bill Moseley <moseley@hank.org> writes: > Not getting any nibbles, so allow me to try a short question: > If I have a deadlock situation (that will be reported as such by > Postgresql once the deadlock_timeout passes), does pg_stat_activity > show the queries that are blocking each other? In 8.2 or later it should do so; in prior versions there could be some lag involved. Another thing to keep in mind is that backends like to cache copies of the pg_stat_activity view --- if you are watching it to see what is happening, your view is only current as of the start of your current transaction. Or you can do pgstat_clear_snapshot() to force collection of new info. regards, tom lane
On Sat, May 02, 2009 at 11:48:21AM -0400, Tom Lane wrote: > Bill Moseley <moseley@hank.org> writes: > > Not getting any nibbles, so allow me to try a short question: > > If I have a deadlock situation (that will be reported as such by > > Postgresql once the deadlock_timeout passes), does pg_stat_activity > > show the queries that are blocking each other? > > In 8.2 or later it should do so; in prior versions there could be some > lag involved. Another thing to keep in mind is that backends like to > cache copies of the pg_stat_activity view --- if you are watching it > to see what is happening, your view is only current as of the start > of your current transaction. Or you can do pgstat_clear_snapshot() > to force collection of new info. Thanks Tom, I'm not clear how to run pgstat_clear_snapshot(), but I looked again and before I ran my test script and pg_stat_activity doesn't list any queries waiting. So, I don't believe it's showing stale data. Then when I run the test script (which runs the same transaction in two processes at the same time) and get a deadlock the same query is shown twice both with "waiting" set true: UPDATE account set foo = 123 where id = $1 And if I remove that update from the transaction I no longer have the deadlock. So, it seems like that is the problem update. Is postgresql telling me that it's deadlocked on two transactions trying to run that same update? There are no other updates to that account table in the transaction, so I'm confused how that is causing a deadlock. Is there something else I can do to understand what exactly is the reason for the deadlock? Thanks, -- Bill Moseley. moseley@hank.org Sent from my iMutt
Bill Moseley <moseley@hank.org> writes: > Then when I run the test script (which runs the same transaction in two > processes at the same time) and get a deadlock the same query is shown twice > both with "waiting" set true: > UPDATE account set foo = 123 where id = $1 > And if I remove that update from the transaction I no longer have the deadlock. > So, it seems like that is the problem update. > Is postgresql telling me that it's deadlocked on two transactions trying to run > that same update? Looks like it. > There are no other updates to that account table in the transaction, so I'm > confused how that is causing a deadlock. Is there more than one row with the target id? Does the account table have any foreign-key references to or from it? It's sometimes possible to get a deadlock associated with trying to lock FK-referenced rows that several updated rows have in common. regards, tom lane
On Sun, May 03, 2009 at 09:08:29PM -0400, Tom Lane wrote: > > There are no other updates to that account table in the transaction, so I'm > > confused how that is causing a deadlock. > > Is there more than one row with the target id? No. It's a single SERIAL primary key. > Does the account table have any foreign-key references to or from it? Many. I had dropped all constraints, except foreign keys, from the account table to see if that would help. (Didn't). One CHECK constraint involved checking the sum of two columns in the account table which seemed like a potential place for a deadlock. But I didn't think the foreign keys would be a problem. About 8 tables reference the account table, and the account table has about 5 columns that reference other tables. > It's sometimes possible to get a deadlock associated with trying to lock > FK-referenced rows that several updated rows have in common. The transaction has a number of selects and inserts not related to the account table. There is an insert into a log table that references the account table, though, that happens right before the update to the account table. I can't picture the deadlock, but I'm only familiar with the obvious examples. What's the approach for dealing with this kind of deadlock (assuming the FK-related as you suggest)? I assume at this point I need to try explicit locking earlier in the transaction. Any suggestions which lock to use and on what? SHARE ROW EXCLUSIVE on the account table before issuing the update? -- Bill Moseley. moseley@hank.org Sent from my iMutt
In response to Bill Moseley <moseley@hank.org>: > On Sun, May 03, 2009 at 09:08:29PM -0400, Tom Lane wrote: > > > There are no other updates to that account table in the transaction, so I'm > > > confused how that is causing a deadlock. > > > > Is there more than one row with the target id? > > No. It's a single SERIAL primary key. I've never gone to the trouble to isolate this to a reproducible test case, but I've seen situations where a single UPDATE statement appears to deadlock when run simultaneously. For example: UPDATE session_table SET text_field = '==some huge value==' WHERE non_unique_column = 5; Assuming this table sees frequent updates and that text_field is normally very large, if there are a sufficient number of rows where non_unique_column is 5, running this query in two independent sessions has a high chance of deadlocking. My theory is that since there is no ordering to the results returned by the WHERE clause, the two queries may acquire row locks in different orders. The result being that they may actually deadlock at the row level. I don't know 100% if this is possible, but I do know that working off that assumption, we reorganized our data structure so that we could use a unique column in the WHERE clause, and we have not seen the problem since. In any event, I hope that information is helpful to you. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
In response to Bill Moran <wmoran@potentialtech.com>: > In response to Bill Moseley <moseley@hank.org>: > > > On Sun, May 03, 2009 at 09:08:29PM -0400, Tom Lane wrote: > > > > There are no other updates to that account table in the transaction, so I'm > > > > confused how that is causing a deadlock. > > > > > > Is there more than one row with the target id? > > > > No. It's a single SERIAL primary key. > > I've never gone to the trouble to isolate this to a reproducible test ... > In any event, I hope that information is helpful to you. And on second read ... that has absolutely nothing to do with your scenario ... -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
I seemed to have resolved this issue by a: LOCK TABLE account IN EXCLUSIVE MODE; at the very start of the transaction to prevent that transaction from running in parallel. The transaction has quite a few selects and inserts and a few updates. I was hoping to not lock at the start of the transaction which will effectively serialize that code. The update that was causing the deadlock was just about the last command in the transaction. Removing that update and the deadlocks go away. I had hoped that a LOCK near the end of the transaction (before that UPDATE that deadlocks) would work. Oddly, it didn't and the deadlock was reported then on the LOCK itself, plus my deadlock_timeout (60 seconds) didn't seem to apply in that case. A mystery. Thanks for the help, -- Bill Moseley. moseley@hank.org Sent from my iMutt
Bill Moseley wrote: > I seemed to have resolved this issue by a: > > LOCK TABLE account IN EXCLUSIVE MODE; > > at the very start of the transaction to prevent that transaction > from running in parallel. > > The transaction has quite a few selects and inserts and a few updates. > I was hoping to not lock at the start of the transaction which will > effectively serialize that code. > > The update that was causing the deadlock was just about the last > command in the transaction. Removing that update and the deadlocks > go away. I think you should be able to do a SELECT FOR UPDATE with the same WHERE as the UPDATE as the first thing in your transaction. That way it is much less likely to deadlock with itself. (This assumes that the set of tuples to update doesn't change, which holds true everytime if your transaction has isolation level serializable). -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.