Thread: deadlock
Hi, I've come across a strange deadlock that I need your help with. There are two copies of the same Perl daemon running on a 2 cpu box. The program is pretty simple (because I wrote it :)) so I can trace its pathway fairly well: in it, there's a single "LOCK table foo" occurring part way through a transaction that sometimes ends up as this: DETAIL: Process 91376 waits for AccessExclusiveLock on relation 16488 of database 16386; blocked by process 92387. Process 92387 waits for AccessExclusiveLock on relation 16488 of database 16386; blocked by process 91376. After the exclusive lock, there is also exactly one SELECT, and then one UPDATE query involving table foo, among others, doing their usual implicit locking on it. I've read in the manuals that it's okay to stack locks this way as long as the more restrictive locks precede less restrictive ones. Mind you, there may be many requests per second, and some of them can and will happen at the same wall clock time due to 2 cpus at work. Can locking break under these circumstances? I'd rather opt for an educated solution to this, than having to check and restart the query. PostgreSQL 8.3.1 FreeBSD 7.0 p5-DBI-1.60.1 p5-DBD-Pg-1.49 Thanks for any tips.
On Thu, Apr 3, 2008 at 10:29 AM, rihad <rihad@mail.ru> wrote: > Hi, > > I've come across a strange deadlock that I need your help with. There are > two copies of the same Perl daemon running on a 2 cpu box. The program is > pretty simple (because I wrote it :)) so I can trace its pathway fairly > well: in it, there's a single "LOCK table foo" occurring part way through a > transaction that sometimes ends up as this: For what reason is the table being locked?
rihad wrote: > Hi, > > I've come across a strange deadlock that I need your help with. There > are two copies of the same Perl daemon running on a 2 cpu box. The > program is pretty simple (because I wrote it :)) so I can trace its > pathway fairly well: in it, there's a single "LOCK table foo" occurring > part way through a transaction that sometimes ends up as this: > > DETAIL: Process 91376 waits for AccessExclusiveLock on relation 16488 > of database 16386; blocked by process 92387. > Process 92387 waits for AccessExclusiveLock on relation 16488 of > database 16386; blocked by process 91376. If there are only two processes, and each is waiting for an ACCESS EXCLUSIVE lock on the same relation and being blocked by the other one, then presumably both have weaker locks that conflict with ACCESS EXCLUSIVE on that relation. Process 1 can't proceed with the ACCESS EXCLUSIVE lock because process 2 has a lesser lock on the table. Process 2 can't proceed with the ACCESS EXCLUSIVE lock because process 1 has a lesser lock on the table. Deadlock. I don't see any other way the situation could arise, but I'm *very* far from an expert. Note that many statements take out fairly weak locks on a table. See: http://www.postgresql.org/docs/8.3/static/explicit-locking.html In particular, even a basic SELECT takes out an ACCESS SHARE, which conflicts with ACCESS EXCLUSIVE. If you are going to lock the table with ACCESS EXCLUSIVE you need to either: - Take out the ACCESS EXCLUSIVE lock before doing anything else with the table; - Rewrite to avoid the need for the ACCESS EXCLUSIVE lock (say, by using appropriate SELECT ... FOR UPDATE/SHARE row level locking); or - Be prepared to retry transactions when deadlocks arise I'd prefer to avoid the exclusive lock entirely if possible, and failing that I'd want to take it out before doing anything else. > After the exclusive lock, there is also exactly one SELECT But what about BEFORE the LOCK statement? That's what matters. > I've read in the manuals that it's okay to stack > locks this way as long as the more restrictive locks precede less > restrictive ones. Yep, and since ACCESS EXCLUSIVE is the most restrictive lock you need to take it out before doing ANYTHING else if you can't prove through other means (say, knowledge about locking on other tables/records) that a deadlock won't arise. > Mind you, there may be many requests per second, and > some of them can and will happen at the same wall clock time due to 2 > cpus at work. Can locking break under these circumstances? I'd be VERY surprised. -- Craig Ringer
> rihad wrote: >> Hi, >> >> I've come across a strange deadlock that I need your help with. There >> are two copies of the same Perl daemon running on a 2 cpu box. The >> program is pretty simple (because I wrote it :)) so I can trace its >> pathway fairly well: in it, there's a single "LOCK table foo" occurring >> part way through a transaction that sometimes ends up as this: >> >> DETAIL: Process 91376 waits for AccessExclusiveLock on relation 16488 >> of database 16386; blocked by process 92387. >> Process 92387 waits for AccessExclusiveLock on relation 16488 of >> database 16386; blocked by process 91376. > > If there are only two processes, and each is waiting for an ACCESS > EXCLUSIVE lock on the same relation and being blocked by the other one, > then presumably both have weaker locks that conflict with ACCESS > EXCLUSIVE on that relation. > > Process 1 can't proceed with the ACCESS EXCLUSIVE lock because process 2 > has a lesser lock on the table. > > Process 2 can't proceed with the ACCESS EXCLUSIVE lock because process 1 > has a lesser lock on the table. > > Deadlock. > > I don't see any other way the situation could arise, but I'm *very* far > from an expert. > Indeed, there is one SELECT and, conditionally, one UPDATE before the exclusive LOCK, on the table. I've re-read the manual, particularly this line: "One should also ensure that the first lock acquired on an object in a transaction is the highest mode that will be needed for that object." Since SELECT & UPDATE come before LOCK on bw_pool, the bug is obvious. Sadly I can't use any other locking as I need exclusive access to OS's firewall after getting bw_id. Well, I thought I'd move LOCK further away inside the transaction to better mimic fine-grained locking. So one solution is to move it back to the beginning. Thank you!
> rihad wrote: >> Given this type query: >> >> UPDATE bw_pool >> SET user_id=? >> WHERE bw_id= >> (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL) >> RETURNING bw_id > > Can you use a SERIALIZABLE transaction and avoid the explicit lock? > Not really. Since LOCKing bw_pool backs up later firewall manipulation (of which there's one) I'm not really prepared to restart transactions due to deadlocks. It's easier for me to prevent deadlocks altogether by carefully stacking queries according to the level of lock restrictiveness, albeit at a price that the whole transaction will be single threaded, even parts of it that don't need it. I was indeed willing to exclusively lock only as little code as possible (fine-grained locking), but neglected the importance of the locking-type order.
> Scott Marlowe wrote: > > Sure, but you have to trap that all the time. The solution using a > cycling sequence keeps you from ever seeing that (unless you managed > to check out all 9,999 other values while still getting the current > one. No locking needed, dozens of updaters running concurrently and > no need to track update errors. > > Yep, that does sound like it'd be nicer, at least if locks are > becoming free at a reasonable rate (ie you don't have to step through > most of the table to find a free lock). I was working on the probably > mistaken assumption that the OP wanted the "next" / "first" available > slot, not any free slot. > > If there are very few free locks at any given time I have the feeling > the sequence approach could spend a lot of time just scanning through > the table looking for free entries. Then again, using an aggregate > subquery is far from free either, and it's a whole lot nicer to just > repeat one statement until it succeeds rather than retrying the whole > transaction if it conflicts with another (which will happen often if > there's really high demand for locks). > > In fact, both transactions trying to grab the lowest free lock is > practically a recipe for serialization failures, making it even less > attractive. With only two concurrent connections it'd work OK if one > used min() and the other used max() ... but add another couple and > you're in trouble. > > The serial based approach sounds a fair bit better. Serial access to the firewall is what I'm now emulating with "LOCK bw_pool" as the first statement in the transaction. AFAIK Postgres' serializable transactions give you decent parallelism at the price of expecting you to retry them due to serialization errors, and thus cannot be relied upon for doing actions on a shared external resource (like manipulating the firewall) after having LOCKed some table. It's a pity there's no way to let go of the lock as soon as possible, only implicitly at the end of the transaction.