Thread: Select for update with offset interferes with concurrent transactions

Select for update with offset interferes with concurrent transactions

From
"Yngve Nysaeter Pettersen"
Date:
Hello all,

I am in the process of migrating a system from Postgresql 8.3 to 9.0, and
have run into a problem with the task queue systems I am using.

The task queue controls the allocation of tasks between about 1000
processes working in parallel, and is essentially a table of

    record_id (unique)
    project_id
    task_description_id
    state (idle, started, finished)

Each project currently have about 2 million entries. My plan is to
increase that significantly the next few months.

To avoid having the processes trample each other's queries (the first
attempt was to select the first matching entries of the table, which
caused one to block all other transactions), one of the steps I took was
to select a set of idle rows at a random offset into the table from the
project, mark them for update, then update each record's state as started.

   SELECT record_id FROM queue WHERE project_id = my_project AND state =
idle LIMIT n OFFSET i FOR UPDATE

At present "n" is 100-150, "i" is a random value in the range 0-10000.

There is, intentionally, no ordering specified, since that would just slow
down the query, and is not necessary.

For reference, the above query is sent through Django's cursor.execute()
call in a manual transaction block.



What I've discovered when using Postgres 9.0 is that the processes are now
blocking every other query into this table, apparently reducing the task
processing speed by at least a factor of 10, and increasing the load on
the server by a similar factor, compared to when Postgres 8.3 was used.
The problem is apparent just after starting, with only 50-100 processes
active (startup is staggered).

Reducing "n" (and looping), or increasing the "i" range did not work.


The reason seems to be this new part of
http://www.postgresql.org/docs/9.0/static/sql-select.html (towards the end
of the FOR UPDATE section):

    If a LIMIT is used, locking stops once enough rows have been returned
to satisfy the limit
    (but note that rows skipped over by OFFSET will get locked). Similarly,
if FOR UPDATE or
    FOR SHARE is used in a cursor's query, only rows actually fetched or
stepped past by the
    cursor will be locked.

I can't find similar text in the 8.3 or 8.4 documentation.

AFAICT, and assuming I have not misunderstood this part of the
documentation this means that if one of my processing nodes selects a
block of 100 entries at offset 8000 in the resulting table, then every
other node will be blocked while the block is being processed, not just
the nodes that would have selected the rows in the range 0 to 7999, but
also >=8100, because they cannot gain access to the rows.

Also, using FOR SHARE does not seem to solve the problem.

IMO, as a database non-expert, locking rows that were not returned as a
result of the query is a bug. As an example, if a query selects the X last
items in the matching rows, that is equivalent to locking the table, or
the relevant part of it, even if the requester have no intention to modify
those other rows.


Is there any way to avoid this problem? Or do I have to add a random
batch_id field to the queue table in order to separate the processes'
queries so that they do not block each other (as frequently)?

Is it possible to disable the source code causing this (that is, reverting
the patch that introduced the problem, or changing a configuration switch)?


--
Sincerely,
Yngve N. Pettersen
********************************************************************
Senior Developer             Email: yngve@opera.com
Opera Software ASA                   http://www.opera.com/
Phone:  +47 23 69 32 60              Fax:    +47 23 69 24 01
********************************************************************

Re: Select for update with offset interferes with concurrent transactions

From
Andy Colson
Date:
On 2/1/2011 6:32 AM, Yngve Nysaeter Pettersen wrote:
> Hello all,
>
> I am in the process of migrating a system from Postgresql 8.3 to 9.0,
> and have run into a problem with the task queue systems I am using.
>
> The task queue controls the allocation of tasks between about 1000
> processes working in parallel, and is essentially a table of
>
> record_id (unique)
> project_id
> task_description_id
> state (idle, started, finished)
>
> Each project currently have about 2 million entries. My plan is to
> increase that significantly the next few months.
>
> To avoid having the processes trample each other's queries (the first
> attempt was to select the first matching entries of the table, which
> caused one to block all other transactions), one of the steps I took was
> to select a set of idle rows at a random offset into the table from the
> project, mark them for update, then update each record's state as started.
>
> SELECT record_id FROM queue WHERE project_id = my_project AND state =
> idle LIMIT n OFFSET i FOR UPDATE
>
> At present "n" is 100-150, "i" is a random value in the range 0-10000.
>
> There is, intentionally, no ordering specified, since that would just
> slow down the query, and is not necessary.
>
> For reference, the above query is sent through Django's cursor.execute()
> call in a manual transaction block.
>
>
>
> What I've discovered when using Postgres 9.0 is that the processes are
> now blocking every other query into this table, apparently reducing the
> task processing speed by at least a factor of 10, and increasing the
> load on the server by a similar factor, compared to when Postgres 8.3
> was used. The problem is apparent just after starting, with only 50-100
> processes active (startup is staggered).
>
> Reducing "n" (and looping), or increasing the "i" range did not work.
>
>
> The reason seems to be this new part of
> http://www.postgresql.org/docs/9.0/static/sql-select.html (towards the
> end of the FOR UPDATE section):
>
> If a LIMIT is used, locking stops once enough rows have been returned to
> satisfy the limit
> (but note that rows skipped over by OFFSET will get locked). Similarly,
> if FOR UPDATE or
> FOR SHARE is used in a cursor's query, only rows actually fetched or
> stepped past by the
> cursor will be locked.
>
> I can't find similar text in the 8.3 or 8.4 documentation.
>
> AFAICT, and assuming I have not misunderstood this part of the
> documentation this means that if one of my processing nodes selects a
> block of 100 entries at offset 8000 in the resulting table, then every
> other node will be blocked while the block is being processed, not just
> the nodes that would have selected the rows in the range 0 to 7999, but
> also >=8100, because they cannot gain access to the rows.
>
> Also, using FOR SHARE does not seem to solve the problem.
>
> IMO, as a database non-expert, locking rows that were not returned as a
> result of the query is a bug. As an example, if a query selects the X
> last items in the matching rows, that is equivalent to locking the
> table, or the relevant part of it, even if the requester have no
> intention to modify those other rows.
>
>
> Is there any way to avoid this problem? Or do I have to add a random
> batch_id field to the queue table in order to separate the processes'
> queries so that they do not block each other (as frequently)?
>
> Is it possible to disable the source code causing this (that is,
> reverting the patch that introduced the problem, or changing a
> configuration switch)?
>
>

So, if I understand correctly, you:

q = SELECT record_id FROM queue
    WHERE project_id = my_project AND state = idle
    LIMIT n OFFSET i FOR UPDATE
while not q.eof
    update queue set state = started where record_id = x;
    process record_id
    update queue set state = finsihed where record_id = x;
    q.next;


Might I suggest and alternative:

q = update queue set state = started
    WHERE project_id = my_project AND state = idle
    LIMIT n OFFSET i
    RETURNING project_id;
idlist = @q;
commit;

foreach x in idlist
    process record_id
    begin
    update queue set state = finsihed where record_id = x;
    commit;



Forgive the part perl part python sudocode.  Oh, and I've never done
this, no idea if it actually works. :-)

-Andy

Re: Select for update with offset interferes with concurrent transactions

From
"Yngve Nysaeter Pettersen"
Date:
Hi,

Thanks for the quick answer, Andy.

On Tue, 01 Feb 2011 16:19:17 +0100, Andy Colson <andy@squeakycode.net>
wrote:

<snip>
> So, if I understand correctly, you:
>
> q = SELECT record_id FROM queue
>     WHERE project_id = my_project AND state = idle
>     LIMIT n OFFSET i FOR UPDATE
> while not q.eof
>     update queue set state = started where record_id = x;
>     process record_id
>     update queue set state = finsihed where record_id = x;
>     q.next;

Almost, the update to "started" is done for all selected elements first,
releasing the lock, then the items are processed one at a time, marking
each "finished" as they complete. (each processing step can take minutes,
so keeping a lock the whole time is not an option)

> Might I suggest and alternative:
>
> q = update queue set state = started
>     WHERE project_id = my_project AND state = idle
>     LIMIT n OFFSET i
>     RETURNING project_id;
> idlist = @q;
> commit;
>
> foreach x in idlist
>     process record_id
>     begin
>     update queue set state = finsihed where record_id = x;
>     commit;
>
> Forgive the part perl part python sudocode.  Oh, and I've never done
> this, no idea if it actually works. :-)

Thanks for that suggestion, I'll take a look at it.

While I hadn't caught on to the "RETURNING" part, I had been wondering if
using a single step UPDATE might be a solution. One concern I have is how
concurrent updates will affect the returned list (or if they will just be
skipped, as SELECT would in normal transaction mode, if I understood
correctly), or whether it might return with an error code (I know that the
normal update return value is the number of updated items, just not sure
if that applies for "RETURNING").

Although, I will note that this process (if it works) will, sort of, make
FOR UPDATE redundant. Or, if it doesn't, the current lock-policy might
cause issues for concurrent updates for the use-cases where FOR UPDATE is
relevant.

--
Sincerely,
Yngve N. Pettersen
********************************************************************
Senior Developer             Email: yngve@opera.com
Opera Software ASA                   http://www.opera.com/
Phone:  +47 23 69 32 60              Fax:    +47 23 69 24 01
********************************************************************

Re: Select for update with offset interferes with concurrent transactions

From
Andy Colson
Date:
On 2/1/2011 10:59 AM, Yngve Nysaeter Pettersen wrote:
> Hi,
>
> Thanks for the quick answer, Andy.
>
> On Tue, 01 Feb 2011 16:19:17 +0100, Andy Colson <andy@squeakycode.net>
> wrote:
>
> <snip>
>> So, if I understand correctly, you:
>>
>> q = SELECT record_id FROM queue
>> WHERE project_id = my_project AND state = idle
>> LIMIT n OFFSET i FOR UPDATE
>> while not q.eof
>> update queue set state = started where record_id = x;
>> process record_id
>> update queue set state = finsihed where record_id = x;
>> q.next;
>
> Almost, the update to "started" is done for all selected elements first,
> releasing the lock, then the items are processed one at a time, marking
> each "finished" as they complete. (each processing step can take
> minutes, so keeping a lock the whole time is not an option)
>
>> Might I suggest and alternative:
>>
>> q = update queue set state = started
>> WHERE project_id = my_project AND state = idle
>> LIMIT n OFFSET i
>> RETURNING project_id;
>> idlist = @q;
>> commit;
>>
>> foreach x in idlist
>> process record_id
>> begin
>> update queue set state = finsihed where record_id = x;
>> commit;
>>
>> Forgive the part perl part python sudocode. Oh, and I've never done
>> this, no idea if it actually works. :-)
>
> Thanks for that suggestion, I'll take a look at it.
>
> While I hadn't caught on to the "RETURNING" part, I had been wondering
> if using a single step UPDATE might be a solution. One concern I have is
> how concurrent updates will affect the returned list (or if they will
> just be skipped, as SELECT would in normal transaction mode, if I
> understood correctly), or whether it might return with an error code (I
> know that the normal update return value is the number of updated items,
> just not sure if that applies for "RETURNING").
>
> Although, I will note that this process (if it works) will, sort of,
> make FOR UPDATE redundant. Or, if it doesn't, the current lock-policy
> might cause issues for concurrent updates for the use-cases where FOR
> UPDATE is relevant.
>

Yeah, I'd wondered the same thing.  It could be two updates hitting the
same row will deadlock, or maybe not, I'm not sure.  But I think its the
same as with the select, if you happen to have two limits that hit the
same range, you're in trouble.

I think the random limit thing is a race condition itself.  Whenever you
have multiple processes hitting the same rows you're going to run into
problems.  Have you thought of using a sequence instead of a random
limit?  Each process could get the next 100 record_id'd via  a sequence,
then there would be much less chance of deadlock.

-Andy

Re: Select for update with offset interferes with concurrent transactions

From
Tom Lane
Date:
"Yngve Nysaeter Pettersen" <yngve@opera.com> writes:
> To avoid having the processes trample each other's queries (the first
> attempt was to select the first matching entries of the table, which
> caused one to block all other transactions), one of the steps I took was
> to select a set of idle rows at a random offset into the table from the
> project, mark them for update, then update each record's state as started.

>    SELECT record_id FROM queue WHERE project_id = my_project AND state =
> idle LIMIT n OFFSET i FOR UPDATE

> At present "n" is 100-150, "i" is a random value in the range 0-10000.

> There is, intentionally, no ordering specified, since that would just slow
> down the query, and is not necessary.

This seems like a pretty bad design.  There are recognized ways to solve
this problem with more predictability and much less chance of different
processes blocking each other.  In particular, this query seems be based
on some untenable assumptions about the physical row order being stable.

> What I've discovered when using Postgres 9.0 is that the processes are now
> blocking every other query into this table,

In 9.0, LIMIT/OFFSET processing is done after FOR UPDATE locking, which
means that rows skipped over by OFFSET still get locked, which means
that different sessions executing this query are now practically certain
to block each other, rather than just likely to block each other.
This was an intentional change to improve the predictability of FOR
UPDATE's interactions with LIMIT/OFFSET, and indeed it's improved the
predictability of the behavior for you, just not in the direction you'd
like :-(

            regards, tom lane

Re: Select for update with offset interferes with concurrent transactions

From
"Yngve Nysaeter Pettersen"
Date:
On Tue, 01 Feb 2011 18:18:17 +0100, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "Yngve Nysaeter Pettersen" <yngve@opera.com> writes:
>> To avoid having the processes trample each other's queries (the first
>> attempt was to select the first matching entries of the table, which
>> caused one to block all other transactions), one of the steps I took was
>> to select a set of idle rows at a random offset into the table from the
>> project, mark them for update, then update each record's state as
>> started.
>
>>    SELECT record_id FROM queue WHERE project_id = my_project AND state =
>> idle LIMIT n OFFSET i FOR UPDATE
>
>> At present "n" is 100-150, "i" is a random value in the range 0-10000.
>
>> There is, intentionally, no ordering specified, since that would just
>> slow
>> down the query, and is not necessary.
>
> This seems like a pretty bad design.

Well, I don't claim to be a database expert ;).

While there might be better ways, the current one have worked OK in the
year since it was implemented.

> There are recognized ways to solve
> this problem with more predictability and much less chance of different

I'd appreciate it if you could provide a couple of pointers.

> processes blocking each other.  In particular, this query seems be based
> on some untenable assumptions about the physical row order being stable.

No, it does not assume that the row order is stable; I don't really care
about the order of the elements, since the actual order of task execution
depends much more significantly on other variables, and the actual order
isn't important at all (although further design changes might impose some
limited category grouping on the queue, that would still not make the
ordering important within the group).

>> What I've discovered when using Postgres 9.0 is that the processes are
>> now
>> blocking every other query into this table,
>
> In 9.0, LIMIT/OFFSET processing is done after FOR UPDATE locking, which
> means that rows skipped over by OFFSET still get locked, which means
> that different sessions executing this query are now practically certain
> to block each other, rather than just likely to block each other.
> This was an intentional change to improve the predictability of FOR
> UPDATE's interactions with LIMIT/OFFSET, and indeed it's improved the
> predictability of the behavior for you, just not in the direction you'd
> like :-(

That might be, but is is necessary to continue locking (which is what it
sounds like to me) the elements that are not used in the final response
past completing the query?

What happens now, if I understand it correctly, is that if a "select foo
 from bar limit 1 order by whatever offset tablelen-1 for update" is
performed, the effective operation is also "LOCK bar", not just a row lock
on item tablelen-1 in that table. Was that the intention?  (and yes, I am
aware that ordering might be used to reverse that sequence so offset 0 can
be used, but wouldn't that just as much block the query for offset 1?)


--
Sincerely,
Yngve N. Pettersen
********************************************************************
Senior Developer             Email: yngve@opera.com
Opera Software ASA                   http://www.opera.com/
Phone:  +47 23 69 32 60              Fax:    +47 23 69 24 01
********************************************************************

Re: Select for update with offset interferes with concurrent transactions

From
"Yngve Nysaeter Pettersen"
Date:
On Tue, 01 Feb 2011 18:11:23 +0100, Andy Colson <andy@squeakycode.net>
wrote:

> On 2/1/2011 10:59 AM, Yngve Nysaeter Pettersen wrote:
>> Hi,
>>
>> Thanks for the quick answer, Andy.
>>
>> On Tue, 01 Feb 2011 16:19:17 +0100, Andy Colson <andy@squeakycode.net>
>> wrote:
>>
>> <snip>
>>> So, if I understand correctly, you:
>>>
>>> q = SELECT record_id FROM queue
>>> WHERE project_id = my_project AND state = idle
>>> LIMIT n OFFSET i FOR UPDATE
>>> while not q.eof
>>> update queue set state = started where record_id = x;
>>> process record_id
>>> update queue set state = finsihed where record_id = x;
>>> q.next;
>>
>> Almost, the update to "started" is done for all selected elements first,
>> releasing the lock, then the items are processed one at a time, marking
>> each "finished" as they complete. (each processing step can take
>> minutes, so keeping a lock the whole time is not an option)
>>
>>> Might I suggest and alternative:
>>>
>>> q = update queue set state = started
>>> WHERE project_id = my_project AND state = idle
>>> LIMIT n OFFSET i
>>> RETURNING project_id;
>>> idlist = @q;
>>> commit;
>>>
>>> foreach x in idlist
>>> process record_id
>>> begin
>>> update queue set state = finsihed where record_id = x;
>>> commit;
>>>
>>> Forgive the part perl part python sudocode. Oh, and I've never done
>>> this, no idea if it actually works. :-)
>>
>> Thanks for that suggestion, I'll take a look at it.
>>
>> While I hadn't caught on to the "RETURNING" part, I had been wondering
>> if using a single step UPDATE might be a solution. One concern I have is
>> how concurrent updates will affect the returned list (or if they will
>> just be skipped, as SELECT would in normal transaction mode, if I
>> understood correctly), or whether it might return with an error code (I
>> know that the normal update return value is the number of updated items,
>> just not sure if that applies for "RETURNING").
>>
>> Although, I will note that this process (if it works) will, sort of,
>> make FOR UPDATE redundant. Or, if it doesn't, the current lock-policy
>> might cause issues for concurrent updates for the use-cases where FOR
>> UPDATE is relevant.
>>
>
> Yeah, I'd wondered the same thing.  It could be two updates hitting the
> same row will deadlock, or maybe not, I'm not sure.  But I think its the
> same as with the select, if you happen to have two limits that hit the
> same range, you're in trouble.
>
> I think the random limit thing is a race condition itself.  Whenever you
> have multiple processes hitting the same rows you're going to run into
> problems.  Have you thought of using a sequence instead of a random
> limit?  Each process could get the next 100 record_id'd via  a sequence,
> then there would be much less chance of deadlock.

How would that work, in case you would like to provide an example?

I am not really familiar with sequences, as I have only seen them used for
the "id" field in Django generated tables.

In case it is relevant, the processes does not (currently, at least) have
a unique ID; though they have a local sequence number for the machine they
are running on.


--
Sincerely,
Yngve N. Pettersen
********************************************************************
Senior Developer             Email: yngve@opera.com
Opera Software ASA                   http://www.opera.com/
Phone:  +47 23 69 32 60              Fax:    +47 23 69 24 01
********************************************************************

Re: Select for update with offset interferes with concurrent transactions

From
Andy Colson
Date:
On 2/1/2011 12:10 PM, Yngve Nysaeter Pettersen wrote:
> On Tue, 01 Feb 2011 18:11:23 +0100, Andy Colson <andy@squeakycode.net>
>> I think the random limit thing is a race condition itself. Whenever
>> you have multiple processes hitting the same rows you're going to run
>> into problems. Have you thought of using a sequence instead of a
>> random limit? Each process could get the next 100 record_id'd via a
>> sequence, then there would be much less chance of deadlock.
>
> How would that work, in case you would like to provide an example?
>
> I am not really familiar with sequences, as I have only seen them used
> for the "id" field in Django generated tables.
>
> In case it is relevant, the processes does not (currently, at least)
> have a unique ID; though they have a local sequence number for the
> machine they are running on.
>
>

I have a really simple q table I use.

     create table q (id integer not null, msg integer, primary key(id));
     create sequence q_add;
     create sequence q_read;

I insert via q_add:

andy=# insert into q(id, msg) values(nextval('q_add'), 20);
INSERT 0 1
andy=# insert into q(id, msg) values(nextval('q_add'), 4);
INSERT 0 1
andy=# select * from q;
  id | msg
----+-----
   1 |  20
   2 |   4
(2 rows)


Then I run multiple batch proc's which get their next job like:

andy=# select msg from q where id = (select nextval('q_read'));
  msg
-----
   20
(1 row)

andy=# select msg from q where id = (select nextval('q_read'));
  msg
-----
    4
(1 row)


It works for me because I can empty the q table, reset the q_add and
q_read sequences and start over clean.  Not sure if it would work for
your setup.


-Andy




Re: Select for update with offset interferes with concurrent transactions

From
"Yngve Nysaeter Pettersen"
Date:
Thanks Andy,

On Tue, 01 Feb 2011 19:29:08 +0100, Andy Colson <andy@squeakycode.net>
wrote:

> On 2/1/2011 12:10 PM, Yngve Nysaeter Pettersen wrote:
>> On Tue, 01 Feb 2011 18:11:23 +0100, Andy Colson <andy@squeakycode.net>
>>> I think the random limit thing is a race condition itself. Whenever
>>> you have multiple processes hitting the same rows you're going to run
>>> into problems. Have you thought of using a sequence instead of a
>>> random limit? Each process could get the next 100 record_id'd via a
>>> sequence, then there would be much less chance of deadlock.
>>
>> How would that work, in case you would like to provide an example?
>>
>> I am not really familiar with sequences, as I have only seen them used
>> for the "id" field in Django generated tables.
>>
>> In case it is relevant, the processes does not (currently, at least)
>> have a unique ID; though they have a local sequence number for the
>> machine they are running on.
>>
>>
>
> I have a really simple q table I use.
>
>      create table q (id integer not null, msg integer, primary key(id));
>      create sequence q_add;
>      create sequence q_read;
>
> I insert via q_add:
>
> andy=# insert into q(id, msg) values(nextval('q_add'), 20);
> INSERT 0 1
> andy=# insert into q(id, msg) values(nextval('q_add'), 4);
> INSERT 0 1
> andy=# select * from q;
>   id | msg
> ----+-----
>    1 |  20
>    2 |   4
> (2 rows)
>
>
> Then I run multiple batch proc's which get their next job like:
>
> andy=# select msg from q where id = (select nextval('q_read'));
>   msg
> -----
>    20
> (1 row)
>
> andy=# select msg from q where id = (select nextval('q_read'));
>   msg
> -----
>     4
> (1 row)
>
>
> It works for me because I can empty the q table, reset the q_add and
> q_read sequences and start over clean.  Not sure if it would work for
> your setup.

I see how that would work (it is essentially how Django assigns row ids).

My current setup can have multiple runs configured at a time (and have had
several dozen queued, in one case), with varying priorities on each run,
and they might, at least theoretically, be configured in parallel (even
the individual runs are set up in parallel), meaning the ids would not be
sequential (a sequence is used for the id field in each row of the table),
unless they could somehow be allocated for each individual run/project
(multiple sequence objects, one for each run might be an option, but I
don't like that possibility). And as I mentioned elsewhere in the thread I
might make the queuing a bit more complex, which might make this system
even more complicated.

So, AFAICT I am afraid it would not work in the general case for my
project :( .
However, it might be useful in somebody else's project :) .

--
Sincerely,
Yngve N. Pettersen
********************************************************************
Senior Developer             Email: yngve@opera.com
Opera Software ASA                   http://www.opera.com/
Phone:  +47 23 69 32 60              Fax:    +47 23 69 24 01
********************************************************************

Re: Select for update with offset interferes with concurrent transactions

From
Andy Colson
Date:
On 2/1/2011 12:51 PM, Yngve Nysaeter Pettersen wrote:
>
> Thanks Andy,
>
> On Tue, 01 Feb 2011 19:29:08 +0100, Andy Colson <andy@squeakycode.net>
> wrote:
>
>> On 2/1/2011 12:10 PM, Yngve Nysaeter Pettersen wrote:
>>> On Tue, 01 Feb 2011 18:11:23 +0100, Andy Colson <andy@squeakycode.net>
>>>> I think the random limit thing is a race condition itself. Whenever
>>>> you have multiple processes hitting the same rows you're going to run
>>>> into problems. Have you thought of using a sequence instead of a
>>>> random limit? Each process could get the next 100 record_id'd via a
>>>> sequence, then there would be much less chance of deadlock.
>>>
>>> How would that work, in case you would like to provide an example?
>>>
>>> I am not really familiar with sequences, as I have only seen them used
>>> for the "id" field in Django generated tables.
>>>
>>> In case it is relevant, the processes does not (currently, at least)
>>> have a unique ID; though they have a local sequence number for the
>>> machine they are running on.
>>>
>>>
>>
>> I have a really simple q table I use.
>>
>> create table q (id integer not null, msg integer, primary key(id));
>> create sequence q_add;
>> create sequence q_read;
>>
>> I insert via q_add:
>>
>> andy=# insert into q(id, msg) values(nextval('q_add'), 20);
>> INSERT 0 1
>> andy=# insert into q(id, msg) values(nextval('q_add'), 4);
>> INSERT 0 1
>> andy=# select * from q;
>> id | msg
>> ----+-----
>> 1 | 20
>> 2 | 4
>> (2 rows)
>>
>>
>> Then I run multiple batch proc's which get their next job like:
>>
>> andy=# select msg from q where id = (select nextval('q_read'));
>> msg
>> -----
>> 20
>> (1 row)
>>
>> andy=# select msg from q where id = (select nextval('q_read'));
>> msg
>> -----
>> 4
>> (1 row)
>>
>>
>> It works for me because I can empty the q table, reset the q_add and
>> q_read sequences and start over clean. Not sure if it would work for
>> your setup.
>
> I see how that would work (it is essentially how Django assigns row ids).
>
> My current setup can have multiple runs configured at a time (and have had
> several dozen queued, in one case), with varying priorities on each run,
> and they might, at least theoretically, be configured in parallel (even
> the individual runs are set up in parallel), meaning the ids would not be
> sequential (a sequence is used for the id field in each row of the table),
> unless they could somehow be allocated for each individual run/project
> (multiple sequence objects, one for each run might be an option, but I
> don't like that possibility). And as I mentioned elsewhere in the thread I
> might make the queuing a bit more complex, which might make this system
> even more complicated.
>
> So, AFAICT I am afraid it would not work in the general case for my
> project :( .
> However, it might be useful in somebody else's project :) .
>

No, I didn't think it would work for you, yours looks much more
complicated than main.  Just out of curiosity, have you looked at PgQ?

-Andy

Re: Select for update with offset interferes with concurrent transactions

From
Radosław Smogura
Date:
Hmm...

May I ask how this look in details. If e.g. I do select * from myeshop offset
100 limit 20, I have 1000 rows which rows will be locked?

a) 0 to 120, or
b) all rows will be locked.?

Kind regards,
Radek

Tom Lane <tgl@sss.pgh.pa.us> Tuesday 01 February 2011 18:18:17
> In 9.0, LIMIT/OFFSET processing is done after FOR UPDATE locking, which
> means that rows skipped over by OFFSET still get locked, which means
> that different sessions executing this query are now practically certain
> to block each other, rather than just likely to block each other.
> This was an intentional change to improve the predictability of FOR
> UPDATE's interactions with LIMIT/OFFSET, and indeed it's improved the
> predictability of the behavior for you, just not in the direction you'd
> like :-(
>
>             regards, tom lane

Re: Select for update with offset interferes with concurrent transactions

From
"Yngve Nysaeter Pettersen"
Date:
On Tue, 01 Feb 2011 20:04:31 +0100, Andy Colson <andy@squeakycode.net>
wrote:

> On 2/1/2011 12:51 PM, Yngve Nysaeter Pettersen wrote:
>
>> So, AFAICT I am afraid it would not work in the general case for my
>> project :( .
>> However, it might be useful in somebody else's project :) .
>>
>
> No, I didn't think it would work for you, yours looks much more
> complicated than main.  Just out of curiosity, have you looked at PgQ?

I did look around for some queuing systems a year ago, I am not sure if
that one crossed my path, but didn't find any that I thought would work
for me, which might just be due to the fact that I had just started with
database programming (which was also the reason I chose a framework like
Django for most of it; the FOR UPDATE SQL is one of less than 10 locations
where I use raw SQL in my system, because Django could not provide the
functionality) and I just did not realize that it could help me.

Regarding PgQ, based on a quick skimming I am not sure how it would fit in
my case. This may be because the tutorial leaves (IMO) a bit too much up
in the air regarding how the system it is working in is organized, at
least for a relative beginner as myself, and also not how a similar
alternative system would look. A small complete example showing all the
tables involved, the client(s), the server(s), and the operations
performed, might have helped.



--
Sincerely,
Yngve N. Pettersen
********************************************************************
Senior Developer             Email: yngve@opera.com
Opera Software ASA                   http://www.opera.com/
Phone:  +47 23 69 32 60              Fax:    +47 23 69 24 01
********************************************************************

Re: Select for update with offset interferes with concurrent transactions

From
"David Johnston"
Date:
If random sampling is desirable would the following construct limit locking
only to the sampled rows?

SELECT id
FROM tasktable
WHERE id IN (SELECT random_id_sample())
FOR UPDATE

The "random_id_sample" would supply a configurable group of IDs off of
tasktable which the FOR UPDATE would then lock

I guess the issue remains that "random_id_sample()" would still end up
blocking if any of the rows it wants to return are already locked.

I too am using this basic protocol of maintaining state info within the
database and sending every query against it.  As I ponder this more it
really seems as if moving some of this logic into the application layer
would possibly make more sense in Yngve's situation (or at least something
to consider).  Continue to use the database as a persistence mechanism but
code the "dispatching" of tasks in the application layer and then as each
task is dispatched you simply do an "UPDATE table SET state = 'dispatch'
WHERE id = 'ID'" and a similar UPDATE when the task is returned completed.
This somewhat presumes you still only ever hand off one task at a time.  If
you are indeed handing off tasks in batches then it would make sense to have
a "batch" table and operate at the batch level instead of individual tasks -
assigning tasks to a given batch via some standard mechanism.

Either way if you truly want true parallel processing then you need to
create the parallel paths that can operate without clobbering each other and
thus each path needs to have its own pool of tasks since as soon as you have
a shared resource the only true way to make sure it is only allocated once
is to serialize access to it.  An alternative method would be to allow
multiple dispatches but have a "write-once" method that is called and sets
an immutable handler_id and then when the processing begins only the handler
with the matching id would be able allow to perform the actual processing.

I say the above with certainty but at the moment I am using and fairly happy
with my limited serialization - especially since I have specific
sub-properties that I can use to limit how many records are locked AND also
because the locking time is very short (I cap around 20 or so active tasks
to dispatch - and only infrequently at that) so my experience and insight to
high-demand situations is limited.

Dave


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, February 01, 2011 12:18 PM
To: Yngve Nysaeter Pettersen
Cc: pgsql-general@postgresql.org
Subject: Re: Select for update with offset interferes with concurrent
transactions

"Yngve Nysaeter Pettersen" <yngve@opera.com> writes:
> To avoid having the processes trample each other's queries (the first
> attempt was to select the first matching entries of the table, which
> caused one to block all other transactions), one of the steps I took
> was to select a set of idle rows at a random offset into the table
> from the project, mark them for update, then update each record's state as
started.

>    SELECT record_id FROM queue WHERE project_id = my_project AND state
> = idle LIMIT n OFFSET i FOR UPDATE

> At present "n" is 100-150, "i" is a random value in the range 0-10000.

> There is, intentionally, no ordering specified, since that would just
> slow down the query, and is not necessary.

This seems like a pretty bad design.  There are recognized ways to solve
this problem with more predictability and much less chance of different
processes blocking each other.  In particular, this query seems be based on
some untenable assumptions about the physical row order being stable.

> What I've discovered when using Postgres 9.0 is that the processes are
> now blocking every other query into this table,

In 9.0, LIMIT/OFFSET processing is done after FOR UPDATE locking, which
means that rows skipped over by OFFSET still get locked, which means that
different sessions executing this query are now practically certain to block
each other, rather than just likely to block each other.
This was an intentional change to improve the predictability of FOR UPDATE's
interactions with LIMIT/OFFSET, and indeed it's improved the predictability
of the behavior for you, just not in the direction you'd like :-(

            regards, tom lane


Re: Select for update with offset interferes with concurrent transactions

From
"Yngve N. Pettersen (Developer Opera Software ASA)"
Date:
Hello David,


On Wed, 02 Feb 2011 01:36:15 +0100, David Johnston <polobo@yahoo.com>
wrote:

> If random sampling is desirable would the following construct limit
> locking
> only to the sampled rows?
>
> SELECT id
> FROM tasktable
> WHERE id IN (SELECT random_id_sample())
> FOR UPDATE
>
> The "random_id_sample" would supply a configurable group of IDs off of
> tasktable which the FOR UPDATE would then lock
>
> I guess the issue remains that "random_id_sample()" would still end up
> blocking if any of the rows it wants to return are already locked.

My immediate guess is that this would work, and I might explore it once I
get my new fullscale test-db up and running

> I too am using this basic protocol of maintaining state info within the
> database and sending every query against it.  As I ponder this more it
> really seems as if moving some of this logic into the application layer
> would possibly make more sense in Yngve's situation (or at least
> something
> to consider).  Continue to use the database as a persistence mechanism
> but
> code the "dispatching" of tasks in the application layer and then as each
> task is dispatched you simply do an "UPDATE table SET state = 'dispatch'
> WHERE id = 'ID'" and a similar UPDATE when the task is returned
> completed.
> This somewhat presumes you still only ever hand off one task at a time.
> If
> you are indeed handing off tasks in batches then it would make sense to
> have
> a "batch" table and operate at the batch level instead of individual
> tasks -
> assigning tasks to a given batch via some standard mechanism.

If I read you correctly that is what my system does (dispatch = started,
marked by the node that is to do the task).

The reason I am allocating tasks in batches is that there are so many
processes involved that if they pick one at a time they would block each
other. With the block allocation they only need to fetch the tasks once,
meaning that there are not as many requests to the queue at a time, on
average.

> Either way if you truly want true parallel processing then you need to
> create the parallel paths that can operate without clobbering each other
> and
> thus each path needs to have its own pool of tasks since as soon as you
> have

That is what the offset part of the query was supposed to achieve.

At the moment I have worked around the problem by breaking the task list
into 2000 subgroups, and each process picks one at random. That limits the
number of processes that get in each others way, and the measured speed is
now 4-5 times what I saw on Monday, and back in the old range of
performance. However, it is a hack I had hoped to avoid (and I might get
rid of it with the above suggestion)

> a shared resource the only true way to make sure it is only allocated
> once
> is to serialize access to it.  An alternative method would be to allow
> multiple dispatches but have a "write-once" method that is called and
> sets
> an immutable handler_id and then when the processing begins only the
> handler
> with the matching id would be able allow to perform the actual
> processing.

This requires the handlers to have a unique ID, which my system has not
needed so far.

> I say the above with certainty but at the moment I am using and fairly
> happy
> with my limited serialization - especially since I have specific
> sub-properties that I can use to limit how many records are locked AND
> also
> because the locking time is very short (I cap around 20 or so active
> tasks
> to dispatch - and only infrequently at that) so my experience and
> insight to
> high-demand situations is limited.
>
> Dave
>
>
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Tuesday, February 01, 2011 12:18 PM
> To: Yngve Nysaeter Pettersen
> Cc: pgsql-general@postgresql.org
> Subject: Re: Select for update with offset interferes with concurrent
> transactions
>
> "Yngve Nysaeter Pettersen" <yngve@opera.com> writes:
>> To avoid having the processes trample each other's queries (the first
>> attempt was to select the first matching entries of the table, which
>> caused one to block all other transactions), one of the steps I took
>> was to select a set of idle rows at a random offset into the table
>> from the project, mark them for update, then update each record's state
>> as
> started.
>
>>    SELECT record_id FROM queue WHERE project_id = my_project AND state
>> = idle LIMIT n OFFSET i FOR UPDATE
>
>> At present "n" is 100-150, "i" is a random value in the range 0-10000.
>
>> There is, intentionally, no ordering specified, since that would just
>> slow down the query, and is not necessary.
>
> This seems like a pretty bad design.  There are recognized ways to solve
> this problem with more predictability and much less chance of different
> processes blocking each other.  In particular, this query seems be based
> on
> some untenable assumptions about the physical row order being stable.
>
>> What I've discovered when using Postgres 9.0 is that the processes are
>> now blocking every other query into this table,
>
> In 9.0, LIMIT/OFFSET processing is done after FOR UPDATE locking, which
> means that rows skipped over by OFFSET still get locked, which means that
> different sessions executing this query are now practically certain to
> block
> each other, rather than just likely to block each other.
> This was an intentional change to improve the predictability of FOR
> UPDATE's
> interactions with LIMIT/OFFSET, and indeed it's improved the
> predictability
> of the behavior for you, just not in the direction you'd like :-(
>
>             regards, tom lane
>
>


--
Sincerely,
Yngve N. Pettersen
********************************************************************
Senior Developer             Email: yngve@opera.com
Opera Software ASA                   http://www.opera.com/
Phone:  +47 23 69 32 60              Fax:    +47 23 69 24 01
********************************************************************

Re: Select for update with offset interferes with concurrent transactions

From
"Yngve N. Pettersen (Developer Opera Software ASA)"
Date:
Hello all,

Just a quick update of how it went.

I ended up using code similar to a combination of Andy Colson's and David
Johnston's suggestions below, and performance is back at what is was
before. Thanks for the suggestions

BTW: AFAICT I never got a response from Tom Lane about whether it was the
intention with the new FOR UPDATE locking policy to effectively lock the
entire table for all other clients using the exact same Select but with a
different and non-overlapping offset/limit for update query. IMO
continuing to lock unselected rows after the selection have completed is a
significant performance regression.

Also, an off-topic BTW: I have noticed that autovacuum of a table seems to
block ANALYZE of the same table, because the autovacuum do not release its
lock on the table.

On Tue, 01 Feb 2011 16:19:17 +0100, Andy Colson <andy@squeakycode.net>
wrote:

> On 2/1/2011 6:32 AM, Yngve Nysaeter Pettersen wrote:
>> Hello all,
>>
>> I am in the process of migrating a system from Postgresql 8.3 to 9.0,
>> and have run into a problem with the task queue systems I am using.
>>
>> The task queue controls the allocation of tasks between about 1000
>> processes working in parallel, and is essentially a table of
>
<snip>
>
> So, if I understand correctly, you:
>
> q = SELECT record_id FROM queue
>     WHERE project_id = my_project AND state = idle
>     LIMIT n OFFSET i FOR UPDATE
> while not q.eof
>     update queue set state = started where record_id = x;
>     process record_id
>     update queue set state = finsihed where record_id = x;
>     q.next;
>
>
> Might I suggest and alternative:
>
> q = update queue set state = started
>     WHERE project_id = my_project AND state = idle
>     LIMIT n OFFSET i
>     RETURNING project_id;
> idlist = @q;
> commit;
>
> foreach x in idlist
>     process record_id
>     begin
>     update queue set state = finsihed where record_id = x;
>     commit;
>

On Wed, 02 Feb 2011 01:36:15 +0100, David Johnston <polobo@yahoo.com>
wrote:

> If random sampling is desirable would the following construct limit
> locking
> only to the sampled rows?
>
> SELECT id
> FROM tasktable
> WHERE id IN (SELECT random_id_sample())
> FOR UPDATE
>


--
Sincerely,
Yngve N. Pettersen
********************************************************************
Senior Developer             Email: yngve@opera.com
Opera Software ASA                   http://www.opera.com/
Phone:  +47 23 69 32 60              Fax:    +47 23 69 24 01
********************************************************************

Re: Select for update with offset interferes with concurrent transactions

From
Andy Colson
Date:
On 3/14/2011 10:13 AM, Yngve N. Pettersen (Developer Opera Software ASA)
wrote:
> Hello all,
>
> Just a quick update of how it went.
>
> I ended up using code similar to a combination of Andy Colson's and David
> Johnston's suggestions below, and performance is back at what is was
> before. Thanks for the suggestions
>

Sweet, thanks for the update... I always like to hear how things turn out.

-Andy

Re: Select for update with offset interferes with concurrent transactions

From
Merlin Moncure
Date:
On Tue, Feb 1, 2011 at 11:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Yngve Nysaeter Pettersen" <yngve@opera.com> writes:
>> To avoid having the processes trample each other's queries (the first
>> attempt was to select the first matching entries of the table, which
>> caused one to block all other transactions), one of the steps I took was
>> to select a set of idle rows at a random offset into the table from the
>> project, mark them for update, then update each record's state as started.
>
>>    SELECT record_id FROM queue WHERE project_id = my_project AND state =
>> idle LIMIT n OFFSET i FOR UPDATE
>
>> At present "n" is 100-150, "i" is a random value in the range 0-10000.
>
>> There is, intentionally, no ordering specified, since that would just slow
>> down the query, and is not necessary.
>
> This seems like a pretty bad design.  There are recognized ways to solve
> this problem with more predictability and much less chance of different
> processes blocking each other.  In particular, this query seems be based
> on some untenable assumptions about the physical row order being stable.
>
>> What I've discovered when using Postgres 9.0 is that the processes are now
>> blocking every other query into this table,
>
> In 9.0, LIMIT/OFFSET processing is done after FOR UPDATE locking, which
> means that rows skipped over by OFFSET still get locked, which means
> that different sessions executing this query are now practically certain
> to block each other, rather than just likely to block each other.
> This was an intentional change to improve the predictability of FOR
> UPDATE's interactions with LIMIT/OFFSET, and indeed it's improved the
> predictability of the behavior for you, just not in the direction you'd
> like :-(

You can get something approximating the old behavior with a CTE:
with q as (select id from foo where <something> limit x offset y)
select * from foo join q using(id) order by id for update;

not that this is a good idea -- it isn't -- but if you must do it that
way, the above might work.  CTE are always a something to consider
when dealing with order of execution problems which seem to be burning
just about everyone these days.

merlin