Thread: choosing the right locking mode
Given this type query: UPDATE bw_pool SET user_id=? WHERE bw_id= (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL) RETURNING bw_id The idea is to "single-threadedly" get at the next available empty slot, no matter how many such queries run in parallel. So far I've been semi-successfully using LOCK TABLE bw_pool before the UPDATE, but it deadlocks sometimes. Maybe I could use some less restrictive locking mode and prevent possible collisions at the same time? Thanks.
On Thu, Apr 3, 2008 at 10:44 AM, rihad <rihad@mail.ru> wrote: > Given this type query: > > UPDATE bw_pool > SET user_id=? > WHERE bw_id= > (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL) > RETURNING bw_id > > The idea is to "single-threadedly" get at the next available empty slot, no > matter how many such queries run in parallel. So far I've been > semi-successfully using LOCK TABLE bw_pool before the UPDATE, but it > deadlocks sometimes. Maybe I could use some less restrictive locking mode > and prevent possible collisions at the same time? So, is there some reason a sequence won't work here? If you've got a requirement for a no-gap id field, there are other, less locky-ish ways to do it. Locking the table doesn't scale, and that's likely what problem you're seeing.
Scott Marlowe wrote: > On Thu, Apr 3, 2008 at 10:44 AM, rihad <rihad@mail.ru> wrote: >> Given this type query: >> >> UPDATE bw_pool >> SET user_id=? >> WHERE bw_id= >> (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL) >> RETURNING bw_id >> >> The idea is to "single-threadedly" get at the next available empty slot, no >> matter how many such queries run in parallel. So far I've been >> semi-successfully using LOCK TABLE bw_pool before the UPDATE, but it >> deadlocks sometimes. Maybe I could use some less restrictive locking mode >> and prevent possible collisions at the same time? > > So, is there some reason a sequence won't work here? bw_pool is pre-filled with 10 thousand rows of increasing bw_id, each of which is either set (user_id IS NOT NULL) or empty (user_id IS NULL). The state of each can change any time. > If you've got a > requirement for a no-gap id field, there are other, less locky-ish > ways to do it. Locking the table doesn't scale, and that's likely > what problem you're seeing. > There's a shared resource backed by bw_pool that I absolutely need single-threaded access to, despite multiple cpus, hence an all-exclusive lock (or?..)
On Thu, Apr 3, 2008 at 11:32 AM, rihad <rihad@mail.ru> wrote: > Scott Marlowe wrote: > > > On Thu, Apr 3, 2008 at 10:44 AM, rihad <rihad@mail.ru> wrote: > > > > > Given this type query: > > > > > > UPDATE bw_pool > > > SET user_id=? > > > WHERE bw_id= > > > (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL) > > > RETURNING bw_id > > > > > > The idea is to "single-threadedly" get at the next available empty > slot, no > > > matter how many such queries run in parallel. So far I've been > > > semi-successfully using LOCK TABLE bw_pool before the UPDATE, but it > > > deadlocks sometimes. Maybe I could use some less restrictive locking > mode > > > and prevent possible collisions at the same time? > > > > > > > So, is there some reason a sequence won't work here? > > > > bw_pool is pre-filled with 10 thousand rows of increasing bw_id, each of > which is either set (user_id IS NOT NULL) or empty (user_id IS NULL). The > state of each can change any time. So, then ANY id would do, would it not, as long as it was null when you picked it? > > > > > If you've got a > > requirement for a no-gap id field, there are other, less locky-ish > > ways to do it. Locking the table doesn't scale, and that's likely > > what problem you're seeing. > > > > > There's a shared resource backed by bw_pool that I absolutely need > single-threaded access to, despite multiple cpus, hence an all-exclusive > lock (or?..) Well, my most basic question was if that shared resource is a design flaw in the way it's set up. I'm still not convinced it isn't, but I know how you can get stuck with things like this too. Building a solution that works around this limitation may be as much work to get done as fixing whatever basic design flaw underlies this. If it is a design flaw. >
On Thu, Apr 3, 2008 at 11:42 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Thu, Apr 3, 2008 at 11:32 AM, rihad <rihad@mail.ru> wrote: > > Scott Marlowe wrote: > > > > > On Thu, Apr 3, 2008 at 10:44 AM, rihad <rihad@mail.ru> wrote: > > > > > > > Given this type query: > > > > > > > > UPDATE bw_pool > > > > SET user_id=? > > > > WHERE bw_id= > > > > (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL) > > > > RETURNING bw_id > > > > > > > > The idea is to "single-threadedly" get at the next available empty > > slot, no > > > > matter how many such queries run in parallel. So far I've been > > > > semi-successfully using LOCK TABLE bw_pool before the UPDATE, but it > > > > deadlocks sometimes. Maybe I could use some less restrictive locking > > mode > > > > and prevent possible collisions at the same time? > > > > > > > > > > So, is there some reason a sequence won't work here? > > > > > > > bw_pool is pre-filled with 10 thousand rows of increasing bw_id, each of > > which is either set (user_id IS NOT NULL) or empty (user_id IS NULL). The > > state of each can change any time. > > So, then ANY id would do, would it not, as long as it was null when > you picked it? If this is the case, you could use a sequence and just select using it for the id until you hit a row that was null and use it. since all access for this would use the sequence no one would hit the row at the same time, they'd be one ahead or behind you. Set it to cycle and you've got a self-maintaining system.
rihad wrote: > Given this type query: > > UPDATE bw_pool > SET user_id=? > WHERE bw_id= > (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL) > RETURNING bw_id Can you use a SERIALIZABLE transaction and avoid the explicit lock? If I'm not mistaken, using the SERIALIZABLE isolation level should ensure that the following cannot occur: UPDATE begins UPDATE begins Subquery finds free row id 1 Subquery finds free row id 1 Update completes Update completes, overwriting changes from the other update. You'd have to be prepared to retry failed updates, but I doubt that's a big deal in this situation. See: http://www.postgresql.org/docs/8.2/interactive/transaction-iso.html -- Craig Ringer
On Thu, Apr 03, 2008 at 09:44:55PM +0500, rihad wrote: > Given this type query: > > UPDATE bw_pool > SET user_id=? > WHERE bw_id= > (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL) > RETURNING bw_id > > The idea is to "single-threadedly" get at the next available empty slot, > no matter how many such queries run in parallel. Do you "unblock" the pool slot by updating user_id to NULL in some later transaction? If so, how about using INSERTs to lock and DELETEs to unlock? You could have a table of locks: CREATE TABLE bw_locks ( bw_id INTEGER PRIMARY KEY REFERENCES bw_pool (bw_id), user_id INTEGER NOT NULL REFERENCES users ); and have a function to perform the actual slot acquisition: CREATE FUNCTION nextslot (INTEGER) RETURNS INTEGER LANGUAGE plpgsql AS $$ DECLARE id INTEGER; BEGIN LOOP BEGIN INSERT INTO bw_locks (bw_id,user_id) SELECT MIN(bw_id), $1 FROM bw_pool p LEFT JOIN bw_locks l USING (bw_id) WHERE l.bw_id IS NULL RETURNING (MIN(bw_id)) INTO id; IF FOUND THEN RETURN id; END IF; RAISE EXCEPTION 'no free slots---panic!'; EXCEPTION WHEN unique_violation THEN RAISE NOTICE 'nextslot() spinning'; END; END LOOP; END; $$; This will keep trying to find the smallest id, looping when somebody else uses it at the same time. I've not tested this code, nor written anything much like it before so test liberally. > So far I've been > semi-successfully using LOCK TABLE bw_pool before the UPDATE, but it > deadlocks sometimes. Maybe I could use some less restrictive locking > mode and prevent possible collisions at the same time? This problem is always going to be awkward with a relational database though. The problem you want to solve is the opposite of their model. Sam
On Thu, Apr 3, 2008 at 11:45 AM, Craig Ringer <craig@postnewspapers.com.au> wrote: > rihad wrote: > > Given this type query: > > > > UPDATE bw_pool > > SET user_id=? > > WHERE bw_id= > > (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL) > > RETURNING bw_id > > Can you use a SERIALIZABLE transaction and avoid the explicit lock? I'm pretty sure serializable won't fix this.
Scott Marlowe wrote: > On Thu, Apr 3, 2008 at 11:45 AM, Craig Ringer > <craig@postnewspapers.com.au> wrote: >> rihad wrote: >> > Given this type query: >> > >> > UPDATE bw_pool >> > SET user_id=? >> > WHERE bw_id= >> > (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL) >> > RETURNING bw_id >> >> Can you use a SERIALIZABLE transaction and avoid the explicit lock? > > I'm pretty sure serializable won't fix this. I'm far from sure myself, but if it won't I'd be very interested in knowing how it can go wrong. A quick test suggested that it did the job, and according to: http://www.postgresql.org/docs/8.2/interactive/transaction-iso.html it should work. Given the language: ---------- UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the transaction start time. However, such a target row may have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the serializable transaction will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the serializable transaction can proceed with updating the originally found row. But if the first updater commits (and actually updated or deleted the row, not just locked it) then the serializable transaction will be rolled back with the message "ERROR: could not serialize access due to concurrent update" because a serializable transaction cannot modify or lock rows changed by other transactions after the serializable transaction began. --------- Say two updates are begun at the same time. Both run their subqueries and both pick the same free id. One then acquires a ROW EXCLUSIVE lock on the record being updated and the other blocks trying to acquire that lock. The update that successfully grabbed the lock makes its changes and the transaction commits successfully, releasing the lock. The second update is now free to continue, but because the row it was attempting to modify has just been changed under it it'll abort with a serialization error. It seems safe to me. -- Craig Ringer
Scott Marlowe wrote: > Sure, but you have to trap that all the time. The solution using a > cycling sequence keeps you from ever seeing that (unless you managed > to check out all 9,999 other values while still getting the current > one. No locking needed, dozens of updaters running concurrently and > no need to track update errors. > Yep, that does sound like it'd be nicer, at least if locks are becoming free at a reasonable rate (ie you don't have to step through most of the table to find a free lock). I was working on the probably mistaken assumption that the OP wanted the "next" / "first" available slot, not any free slot. If there are very few free locks at any given time I have the feeling the sequence approach could spend a lot of time just scanning through the table looking for free entries. Then again, using an aggregate subquery is far from free either, and it's a whole lot nicer to just repeat one statement until it succeeds rather than retrying the whole transaction if it conflicts with another (which will happen often if there's really high demand for locks). In fact, both transactions trying to grab the lowest free lock is practically a recipe for serialization failures, making it even less attractive. With only two concurrent connections it'd work OK if one used min() and the other used max() ... but add another couple and you're in trouble. The serial based approach sounds a fair bit better. -- Craig Ringer
On Thu, Apr 3, 2008 at 12:58 PM, Craig Ringer <craig@postnewspapers.com.au> wrote: > Scott Marlowe wrote: > > > Sure, but you have to trap that all the time. The solution using a > > cycling sequence keeps you from ever seeing that (unless you managed > > to check out all 9,999 other values while still getting the current > > one. No locking needed, dozens of updaters running concurrently and > > no need to track update errors. > > > > > Yep, that does sound like it'd be nicer, at least if locks are becoming > free at a reasonable rate (ie you don't have to step through most of the > table to find a free lock). I was working on the probably mistaken > assumption that the OP wanted the "next" / "first" available slot, not any > free slot. > > If there are very few free locks at any given time I have the feeling the > sequence approach could spend a lot of time just scanning through the table > looking for free entries. Then again, using an aggregate subquery is far > from free either, and it's a whole lot nicer to just repeat one statement > until it succeeds rather than retrying the whole transaction if it conflicts > with another (which will happen often if there's really high demand for > locks). > > In fact, both transactions trying to grab the lowest free lock is > practically a recipe for serialization failures, making it even less > attractive. With only two concurrent connections it'd work OK if one used > min() and the other used max() ... but add another couple and you're in > trouble. > > The serial based approach sounds a fair bit better. Add prepared select statements and you'd get get pretty fast performance. I'd hope it's a situation where most blocks ARE free. Note that I based my whole argument that if a number down lower could become free then gaps weren't a problem. If gaps somehow are a problem, then there's a whole other set of approaches for providing the illusion of gap free ids without all the cost.
Scott Marlowe wrote: >> >> The serial based approach sounds a fair bit better. >> Er, I meant "sequence". > > Add prepared select statements and you'd get get pretty fast > performance. > Yep, and if DB round trips are a problem it can always be wrapped up in a stored procedure. I'd be tempted to do that anyway just to simplify the client's job, guarantee query plan caching, etc. -- Craig Ringer