Thread: choosing the right locking mode

choosing the right locking mode

From
rihad
Date:
Given this type query:

         UPDATE bw_pool
         SET user_id=?
         WHERE bw_id=
                 (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL)
         RETURNING bw_id

The idea is to "single-threadedly" get at the next available empty slot,
no matter how many such queries run in parallel. So far I've been
semi-successfully using LOCK TABLE bw_pool before the UPDATE, but it
deadlocks sometimes. Maybe I could use some less restrictive locking
mode and prevent possible collisions at the same time?

Thanks.

Re: choosing the right locking mode

From
"Scott Marlowe"
Date:
On Thu, Apr 3, 2008 at 10:44 AM, rihad <rihad@mail.ru> wrote:
> Given this type query:
>
>         UPDATE bw_pool
>         SET user_id=?
>         WHERE bw_id=
>                 (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL)
>         RETURNING bw_id
>
>  The idea is to "single-threadedly" get at the next available empty slot, no
> matter how many such queries run in parallel. So far I've been
> semi-successfully using LOCK TABLE bw_pool before the UPDATE, but it
> deadlocks sometimes. Maybe I could use some less restrictive locking mode
> and prevent possible collisions at the same time?

So, is there some reason a sequence won't work here?  If you've got a
requirement for a no-gap id field, there are other, less locky-ish
ways to do it.  Locking the table doesn't scale, and that's likely
what problem you're seeing.

Re: choosing the right locking mode

From
rihad
Date:
Scott Marlowe wrote:
> On Thu, Apr 3, 2008 at 10:44 AM, rihad <rihad@mail.ru> wrote:
>> Given this type query:
>>
>>         UPDATE bw_pool
>>         SET user_id=?
>>         WHERE bw_id=
>>                 (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL)
>>         RETURNING bw_id
>>
>>  The idea is to "single-threadedly" get at the next available empty slot, no
>> matter how many such queries run in parallel. So far I've been
>> semi-successfully using LOCK TABLE bw_pool before the UPDATE, but it
>> deadlocks sometimes. Maybe I could use some less restrictive locking mode
>> and prevent possible collisions at the same time?
>
> So, is there some reason a sequence won't work here?

bw_pool is pre-filled with 10 thousand rows of increasing bw_id, each of
which is either set (user_id IS NOT NULL) or empty (user_id IS NULL).
The state of each can change any time.

> If you've got a
> requirement for a no-gap id field, there are other, less locky-ish
> ways to do it.  Locking the table doesn't scale, and that's likely
> what problem you're seeing.
>
There's a shared resource backed by bw_pool that I absolutely need
single-threaded access to, despite multiple cpus, hence an all-exclusive
lock (or?..)

Re: choosing the right locking mode

From
"Scott Marlowe"
Date:
On Thu, Apr 3, 2008 at 11:32 AM, rihad <rihad@mail.ru> wrote:
> Scott Marlowe wrote:
>
> > On Thu, Apr 3, 2008 at 10:44 AM, rihad <rihad@mail.ru> wrote:
> >
> > > Given this type query:
> > >
> > >        UPDATE bw_pool
> > >        SET user_id=?
> > >        WHERE bw_id=
> > >                (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL)
> > >        RETURNING bw_id
> > >
> > >  The idea is to "single-threadedly" get at the next available empty
> slot, no
> > > matter how many such queries run in parallel. So far I've been
> > > semi-successfully using LOCK TABLE bw_pool before the UPDATE, but it
> > > deadlocks sometimes. Maybe I could use some less restrictive locking
> mode
> > > and prevent possible collisions at the same time?
> > >
> >
> > So, is there some reason a sequence won't work here?
> >
>
>  bw_pool is pre-filled with 10 thousand rows of increasing bw_id, each of
> which is either set (user_id IS NOT NULL) or empty (user_id IS NULL). The
> state of each can change any time.

So, then ANY id would do, would it not, as long as it was null when
you picked it?

>
>
>
> > If you've got a
> > requirement for a no-gap id field, there are other, less locky-ish
> > ways to do it.  Locking the table doesn't scale, and that's likely
> > what problem you're seeing.
> >
> >
>  There's a shared resource backed by bw_pool that I absolutely need
> single-threaded access to, despite multiple cpus, hence an all-exclusive
> lock (or?..)

Well, my most basic question was if that shared resource is a design
flaw in the way it's set up.  I'm still not convinced it isn't, but I
know how you can get stuck with things like this too.  Building a
solution that works around this limitation may be as much work to get
done as fixing whatever basic design flaw underlies this.  If it is a
design flaw.

>

Re: choosing the right locking mode

From
"Scott Marlowe"
Date:
On Thu, Apr 3, 2008 at 11:42 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Thu, Apr 3, 2008 at 11:32 AM, rihad <rihad@mail.ru> wrote:
>  > Scott Marlowe wrote:
>  >
>  > > On Thu, Apr 3, 2008 at 10:44 AM, rihad <rihad@mail.ru> wrote:
>  > >
>  > > > Given this type query:
>  > > >
>  > > >        UPDATE bw_pool
>  > > >        SET user_id=?
>  > > >        WHERE bw_id=
>  > > >                (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL)
>  > > >        RETURNING bw_id
>  > > >
>  > > >  The idea is to "single-threadedly" get at the next available empty
>  > slot, no
>  > > > matter how many such queries run in parallel. So far I've been
>  > > > semi-successfully using LOCK TABLE bw_pool before the UPDATE, but it
>  > > > deadlocks sometimes. Maybe I could use some less restrictive locking
>  > mode
>  > > > and prevent possible collisions at the same time?
>  > > >
>  > >
>  > > So, is there some reason a sequence won't work here?
>  > >
>  >
>  >  bw_pool is pre-filled with 10 thousand rows of increasing bw_id, each of
>  > which is either set (user_id IS NOT NULL) or empty (user_id IS NULL). The
>  > state of each can change any time.
>
>  So, then ANY id would do, would it not, as long as it was null when
>  you picked it?

If this is the case, you could use a sequence and just select using it
for the id until you hit a row that was null and use it.  since all
access for this would use the sequence no one would hit the row at the
same time, they'd be one ahead or behind you.  Set it to cycle and
you've got a self-maintaining system.

Re: choosing the right locking mode

From
Craig Ringer
Date:
rihad wrote:
> Given this type query:
>
>         UPDATE bw_pool
>         SET user_id=?
>         WHERE bw_id=
>                 (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL)
>         RETURNING bw_id

Can you use a SERIALIZABLE transaction and avoid the explicit lock?

If I'm not mistaken, using the SERIALIZABLE isolation level should
ensure that the following cannot occur:



UPDATE begins
                                 UPDATE begins
Subquery finds free row id 1
                                 Subquery finds free row id 1

                                 Update completes

Update completes, overwriting
changes from the other update.



You'd have to be prepared to retry failed updates, but I doubt that's a
big deal in this situation.

See:

http://www.postgresql.org/docs/8.2/interactive/transaction-iso.html

--
Craig Ringer

Re: choosing the right locking mode

From
Sam Mason
Date:
On Thu, Apr 03, 2008 at 09:44:55PM +0500, rihad wrote:
> Given this type query:
>
>         UPDATE bw_pool
>         SET user_id=?
>         WHERE bw_id=
>                 (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL)
>         RETURNING bw_id
>
> The idea is to "single-threadedly" get at the next available empty slot,
> no matter how many such queries run in parallel.

Do you "unblock" the pool slot by updating user_id to NULL in some later
transaction?  If so, how about using INSERTs to lock and DELETEs to
unlock?  You could have a table of locks:

  CREATE TABLE bw_locks (
    bw_id INTEGER PRIMARY KEY REFERENCES bw_pool (bw_id),
    user_id INTEGER NOT NULL REFERENCES users
  );

and have a function to perform the actual slot acquisition:

  CREATE FUNCTION nextslot (INTEGER) RETURNS INTEGER LANGUAGE plpgsql AS $$
  DECLARE
    id INTEGER;
  BEGIN
    LOOP
      BEGIN
        INSERT INTO bw_locks (bw_id,user_id)
          SELECT MIN(bw_id), $1
          FROM bw_pool p LEFT JOIN bw_locks l USING (bw_id)
          WHERE l.bw_id IS NULL
          RETURNING (MIN(bw_id)) INTO id;
        IF FOUND THEN
          RETURN id;
        END IF;
    RAISE EXCEPTION 'no free slots---panic!';
      EXCEPTION
        WHEN unique_violation THEN RAISE NOTICE 'nextslot() spinning';
      END;
    END LOOP;
  END; $$;

This will keep trying to find the smallest id, looping when somebody
else uses it at the same time.  I've not tested this code, nor written
anything much like it before so test liberally.

> So far I've been
> semi-successfully using LOCK TABLE bw_pool before the UPDATE, but it
> deadlocks sometimes. Maybe I could use some less restrictive locking
> mode and prevent possible collisions at the same time?

This problem is always going to be awkward with a relational database
though.  The problem you want to solve is the opposite of their model.


  Sam

Re: choosing the right locking mode

From
"Scott Marlowe"
Date:
On Thu, Apr 3, 2008 at 11:45 AM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> rihad wrote:
>  > Given this type query:
>  >
>  >         UPDATE bw_pool
>  >         SET user_id=?
>  >         WHERE bw_id=
>  >                 (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL)
>  >         RETURNING bw_id
>
>  Can you use a SERIALIZABLE transaction and avoid the explicit lock?

I'm pretty sure serializable won't fix this.

Re: choosing the right locking mode

From
Craig Ringer
Date:
Scott Marlowe wrote:
> On Thu, Apr 3, 2008 at 11:45 AM, Craig Ringer
> <craig@postnewspapers.com.au> wrote:
>> rihad wrote:
>>  > Given this type query:
>>  >
>>  >         UPDATE bw_pool
>>  >         SET user_id=?
>>  >         WHERE bw_id=
>>  >                 (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL)
>>  >         RETURNING bw_id
>>
>>  Can you use a SERIALIZABLE transaction and avoid the explicit lock?
>
> I'm pretty sure serializable won't fix this.

I'm far from sure myself, but if it won't I'd be very interested in
knowing how it can go wrong. A quick test suggested that it did the job,
and according to:

http://www.postgresql.org/docs/8.2/interactive/transaction-iso.html

it should work.

Given the language:

----------
UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave
the same as SELECT in terms of searching for target rows: they will only
find target rows that were committed as of the transaction start time.
However, such a target row may have already been updated (or deleted or
locked) by another concurrent transaction by the time it is found. In
this case, the serializable transaction will wait for the first updating
transaction to commit or roll back (if it is still in progress). If the
first updater rolls back, then its effects are negated and the
serializable transaction can proceed with updating the originally found
row. But if the first updater commits (and actually updated or deleted
the row, not just locked it) then the serializable transaction will be
rolled back with the message "ERROR:  could not serialize access due to
concurrent update" because a serializable transaction cannot modify or
lock rows changed by other transactions after the serializable
transaction began.
---------

Say two updates are begun at the same time. Both run their subqueries
and both pick the same free id. One then acquires a ROW EXCLUSIVE lock
on the record being updated and the other blocks trying to acquire that
lock. The update that successfully grabbed the lock makes its changes
and the transaction commits successfully, releasing the lock. The second
update is now free to continue, but because the row it was attempting to
modify has just been changed under it it'll abort with a serialization
error.

It seems safe to me.

--
Craig Ringer


Re: choosing the right locking mode

From
Craig Ringer
Date:
Scott Marlowe wrote:
> Sure, but you have to trap that all the time.  The solution using a
> cycling sequence keeps you from ever seeing that (unless you managed
> to check out all 9,999 other values while still getting the current
> one.  No locking needed, dozens of updaters running concurrently and
> no need to track update errors.
>
Yep, that does sound like it'd be nicer, at least if locks are becoming
free at a reasonable rate (ie you don't have to step through most of the
table to find a free lock). I was working on the probably mistaken
assumption that the OP wanted the "next" / "first" available slot, not
any free slot.

If there are very few free locks at any given time I have the feeling
the sequence approach could spend a lot of time just scanning through
the table looking for free entries. Then again, using an aggregate
subquery is far from free either, and it's a whole lot nicer to just
repeat one statement until it succeeds rather than retrying the whole
transaction if it conflicts with another (which will happen often if
there's really high demand for locks).

In fact, both transactions trying to grab the lowest free lock is
practically a recipe for serialization failures, making it even less
attractive. With only two concurrent connections it'd work OK if one
used min() and the other used max() ... but add another couple and
you're in trouble.

The serial based approach sounds a fair bit better.

--
Craig Ringer

Re: choosing the right locking mode

From
"Scott Marlowe"
Date:
On Thu, Apr 3, 2008 at 12:58 PM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> Scott Marlowe wrote:
>
> > Sure, but you have to trap that all the time.  The solution using a
> > cycling sequence keeps you from ever seeing that (unless you managed
> > to check out all 9,999 other values while still getting the current
> > one.  No locking needed, dozens of updaters running concurrently and
> > no need to track update errors.
> >
> >
>  Yep, that does sound like it'd be nicer, at least if locks are becoming
> free at a reasonable rate (ie you don't have to step through most of the
> table to find a free lock). I was working on the probably mistaken
> assumption that the OP wanted the "next" / "first" available slot, not any
> free slot.
>
>  If there are very few free locks at any given time I have the feeling the
> sequence approach could spend a lot of time just scanning through the table
> looking for free entries. Then again, using an aggregate subquery is far
> from free either, and it's a whole lot nicer to just repeat one statement
> until it succeeds rather than retrying the whole transaction if it conflicts
> with another (which will happen often if there's really high demand for
> locks).
>
>  In fact, both transactions trying to grab the lowest free lock is
> practically a recipe for serialization failures, making it even less
> attractive. With only two concurrent connections it'd work OK if one used
> min() and the other used max() ... but add another couple and you're in
> trouble.
>
>  The serial based approach sounds a fair bit better.

Add prepared select statements and you'd get get pretty fast
performance.  I'd hope it's a situation where most blocks ARE free.

Note that I based my whole argument that if a number down lower could
become free then gaps weren't a problem.  If gaps somehow are a
problem, then there's a whole other set of approaches for providing
the illusion of gap free ids without all the cost.

Re: choosing the right locking mode

From
Craig Ringer
Date:
Scott Marlowe wrote:
>>
>>  The serial based approach sounds a fair bit better.
>>
Er, I meant "sequence".
>
> Add prepared select statements and you'd get get pretty fast
> performance.
>
Yep, and if DB round trips are a problem it can always be wrapped up in
a stored procedure. I'd be tempted to do that anyway just to simplify
the client's job, guarantee query plan caching, etc.

--
Craig Ringer