Thread: Select for update with offset interferes with concurrent transactions
Select for update with offset interferes with concurrent transactions
From
"Yngve Nysaeter Pettersen"
Date:
Hello all, I am in the process of migrating a system from Postgresql 8.3 to 9.0, and have run into a problem with the task queue systems I am using. The task queue controls the allocation of tasks between about 1000 processes working in parallel, and is essentially a table of record_id (unique) project_id task_description_id state (idle, started, finished) Each project currently have about 2 million entries. My plan is to increase that significantly the next few months. To avoid having the processes trample each other's queries (the first attempt was to select the first matching entries of the table, which caused one to block all other transactions), one of the steps I took was to select a set of idle rows at a random offset into the table from the project, mark them for update, then update each record's state as started. SELECT record_id FROM queue WHERE project_id = my_project AND state = idle LIMIT n OFFSET i FOR UPDATE At present "n" is 100-150, "i" is a random value in the range 0-10000. There is, intentionally, no ordering specified, since that would just slow down the query, and is not necessary. For reference, the above query is sent through Django's cursor.execute() call in a manual transaction block. What I've discovered when using Postgres 9.0 is that the processes are now blocking every other query into this table, apparently reducing the task processing speed by at least a factor of 10, and increasing the load on the server by a similar factor, compared to when Postgres 8.3 was used. The problem is apparent just after starting, with only 50-100 processes active (startup is staggered). Reducing "n" (and looping), or increasing the "i" range did not work. The reason seems to be this new part of http://www.postgresql.org/docs/9.0/static/sql-select.html (towards the end of the FOR UPDATE section): If a LIMIT is used, locking stops once enough rows have been returned to satisfy the limit (but note that rows skipped over by OFFSET will get locked). Similarly, if FOR UPDATE or FOR SHARE is used in a cursor's query, only rows actually fetched or stepped past by the cursor will be locked. I can't find similar text in the 8.3 or 8.4 documentation. AFAICT, and assuming I have not misunderstood this part of the documentation this means that if one of my processing nodes selects a block of 100 entries at offset 8000 in the resulting table, then every other node will be blocked while the block is being processed, not just the nodes that would have selected the rows in the range 0 to 7999, but also >=8100, because they cannot gain access to the rows. Also, using FOR SHARE does not seem to solve the problem. IMO, as a database non-expert, locking rows that were not returned as a result of the query is a bug. As an example, if a query selects the X last items in the matching rows, that is equivalent to locking the table, or the relevant part of it, even if the requester have no intention to modify those other rows. Is there any way to avoid this problem? Or do I have to add a random batch_id field to the queue table in order to separate the processes' queries so that they do not block each other (as frequently)? Is it possible to disable the source code causing this (that is, reverting the patch that introduced the problem, or changing a configuration switch)? -- Sincerely, Yngve N. Pettersen ******************************************************************** Senior Developer Email: yngve@opera.com Opera Software ASA http://www.opera.com/ Phone: +47 23 69 32 60 Fax: +47 23 69 24 01 ********************************************************************
On 2/1/2011 6:32 AM, Yngve Nysaeter Pettersen wrote: > Hello all, > > I am in the process of migrating a system from Postgresql 8.3 to 9.0, > and have run into a problem with the task queue systems I am using. > > The task queue controls the allocation of tasks between about 1000 > processes working in parallel, and is essentially a table of > > record_id (unique) > project_id > task_description_id > state (idle, started, finished) > > Each project currently have about 2 million entries. My plan is to > increase that significantly the next few months. > > To avoid having the processes trample each other's queries (the first > attempt was to select the first matching entries of the table, which > caused one to block all other transactions), one of the steps I took was > to select a set of idle rows at a random offset into the table from the > project, mark them for update, then update each record's state as started. > > SELECT record_id FROM queue WHERE project_id = my_project AND state = > idle LIMIT n OFFSET i FOR UPDATE > > At present "n" is 100-150, "i" is a random value in the range 0-10000. > > There is, intentionally, no ordering specified, since that would just > slow down the query, and is not necessary. > > For reference, the above query is sent through Django's cursor.execute() > call in a manual transaction block. > > > > What I've discovered when using Postgres 9.0 is that the processes are > now blocking every other query into this table, apparently reducing the > task processing speed by at least a factor of 10, and increasing the > load on the server by a similar factor, compared to when Postgres 8.3 > was used. The problem is apparent just after starting, with only 50-100 > processes active (startup is staggered). > > Reducing "n" (and looping), or increasing the "i" range did not work. > > > The reason seems to be this new part of > http://www.postgresql.org/docs/9.0/static/sql-select.html (towards the > end of the FOR UPDATE section): > > If a LIMIT is used, locking stops once enough rows have been returned to > satisfy the limit > (but note that rows skipped over by OFFSET will get locked). Similarly, > if FOR UPDATE or > FOR SHARE is used in a cursor's query, only rows actually fetched or > stepped past by the > cursor will be locked. > > I can't find similar text in the 8.3 or 8.4 documentation. > > AFAICT, and assuming I have not misunderstood this part of the > documentation this means that if one of my processing nodes selects a > block of 100 entries at offset 8000 in the resulting table, then every > other node will be blocked while the block is being processed, not just > the nodes that would have selected the rows in the range 0 to 7999, but > also >=8100, because they cannot gain access to the rows. > > Also, using FOR SHARE does not seem to solve the problem. > > IMO, as a database non-expert, locking rows that were not returned as a > result of the query is a bug. As an example, if a query selects the X > last items in the matching rows, that is equivalent to locking the > table, or the relevant part of it, even if the requester have no > intention to modify those other rows. > > > Is there any way to avoid this problem? Or do I have to add a random > batch_id field to the queue table in order to separate the processes' > queries so that they do not block each other (as frequently)? > > Is it possible to disable the source code causing this (that is, > reverting the patch that introduced the problem, or changing a > configuration switch)? > > So, if I understand correctly, you: q = SELECT record_id FROM queue WHERE project_id = my_project AND state = idle LIMIT n OFFSET i FOR UPDATE while not q.eof update queue set state = started where record_id = x; process record_id update queue set state = finsihed where record_id = x; q.next; Might I suggest and alternative: q = update queue set state = started WHERE project_id = my_project AND state = idle LIMIT n OFFSET i RETURNING project_id; idlist = @q; commit; foreach x in idlist process record_id begin update queue set state = finsihed where record_id = x; commit; Forgive the part perl part python sudocode. Oh, and I've never done this, no idea if it actually works. :-) -Andy
Re: Select for update with offset interferes with concurrent transactions
From
"Yngve Nysaeter Pettersen"
Date:
Hi, Thanks for the quick answer, Andy. On Tue, 01 Feb 2011 16:19:17 +0100, Andy Colson <andy@squeakycode.net> wrote: <snip> > So, if I understand correctly, you: > > q = SELECT record_id FROM queue > WHERE project_id = my_project AND state = idle > LIMIT n OFFSET i FOR UPDATE > while not q.eof > update queue set state = started where record_id = x; > process record_id > update queue set state = finsihed where record_id = x; > q.next; Almost, the update to "started" is done for all selected elements first, releasing the lock, then the items are processed one at a time, marking each "finished" as they complete. (each processing step can take minutes, so keeping a lock the whole time is not an option) > Might I suggest and alternative: > > q = update queue set state = started > WHERE project_id = my_project AND state = idle > LIMIT n OFFSET i > RETURNING project_id; > idlist = @q; > commit; > > foreach x in idlist > process record_id > begin > update queue set state = finsihed where record_id = x; > commit; > > Forgive the part perl part python sudocode. Oh, and I've never done > this, no idea if it actually works. :-) Thanks for that suggestion, I'll take a look at it. While I hadn't caught on to the "RETURNING" part, I had been wondering if using a single step UPDATE might be a solution. One concern I have is how concurrent updates will affect the returned list (or if they will just be skipped, as SELECT would in normal transaction mode, if I understood correctly), or whether it might return with an error code (I know that the normal update return value is the number of updated items, just not sure if that applies for "RETURNING"). Although, I will note that this process (if it works) will, sort of, make FOR UPDATE redundant. Or, if it doesn't, the current lock-policy might cause issues for concurrent updates for the use-cases where FOR UPDATE is relevant. -- Sincerely, Yngve N. Pettersen ******************************************************************** Senior Developer Email: yngve@opera.com Opera Software ASA http://www.opera.com/ Phone: +47 23 69 32 60 Fax: +47 23 69 24 01 ********************************************************************
On 2/1/2011 10:59 AM, Yngve Nysaeter Pettersen wrote: > Hi, > > Thanks for the quick answer, Andy. > > On Tue, 01 Feb 2011 16:19:17 +0100, Andy Colson <andy@squeakycode.net> > wrote: > > <snip> >> So, if I understand correctly, you: >> >> q = SELECT record_id FROM queue >> WHERE project_id = my_project AND state = idle >> LIMIT n OFFSET i FOR UPDATE >> while not q.eof >> update queue set state = started where record_id = x; >> process record_id >> update queue set state = finsihed where record_id = x; >> q.next; > > Almost, the update to "started" is done for all selected elements first, > releasing the lock, then the items are processed one at a time, marking > each "finished" as they complete. (each processing step can take > minutes, so keeping a lock the whole time is not an option) > >> Might I suggest and alternative: >> >> q = update queue set state = started >> WHERE project_id = my_project AND state = idle >> LIMIT n OFFSET i >> RETURNING project_id; >> idlist = @q; >> commit; >> >> foreach x in idlist >> process record_id >> begin >> update queue set state = finsihed where record_id = x; >> commit; >> >> Forgive the part perl part python sudocode. Oh, and I've never done >> this, no idea if it actually works. :-) > > Thanks for that suggestion, I'll take a look at it. > > While I hadn't caught on to the "RETURNING" part, I had been wondering > if using a single step UPDATE might be a solution. One concern I have is > how concurrent updates will affect the returned list (or if they will > just be skipped, as SELECT would in normal transaction mode, if I > understood correctly), or whether it might return with an error code (I > know that the normal update return value is the number of updated items, > just not sure if that applies for "RETURNING"). > > Although, I will note that this process (if it works) will, sort of, > make FOR UPDATE redundant. Or, if it doesn't, the current lock-policy > might cause issues for concurrent updates for the use-cases where FOR > UPDATE is relevant. > Yeah, I'd wondered the same thing. It could be two updates hitting the same row will deadlock, or maybe not, I'm not sure. But I think its the same as with the select, if you happen to have two limits that hit the same range, you're in trouble. I think the random limit thing is a race condition itself. Whenever you have multiple processes hitting the same rows you're going to run into problems. Have you thought of using a sequence instead of a random limit? Each process could get the next 100 record_id'd via a sequence, then there would be much less chance of deadlock. -Andy
"Yngve Nysaeter Pettersen" <yngve@opera.com> writes: > To avoid having the processes trample each other's queries (the first > attempt was to select the first matching entries of the table, which > caused one to block all other transactions), one of the steps I took was > to select a set of idle rows at a random offset into the table from the > project, mark them for update, then update each record's state as started. > SELECT record_id FROM queue WHERE project_id = my_project AND state = > idle LIMIT n OFFSET i FOR UPDATE > At present "n" is 100-150, "i" is a random value in the range 0-10000. > There is, intentionally, no ordering specified, since that would just slow > down the query, and is not necessary. This seems like a pretty bad design. There are recognized ways to solve this problem with more predictability and much less chance of different processes blocking each other. In particular, this query seems be based on some untenable assumptions about the physical row order being stable. > What I've discovered when using Postgres 9.0 is that the processes are now > blocking every other query into this table, In 9.0, LIMIT/OFFSET processing is done after FOR UPDATE locking, which means that rows skipped over by OFFSET still get locked, which means that different sessions executing this query are now practically certain to block each other, rather than just likely to block each other. This was an intentional change to improve the predictability of FOR UPDATE's interactions with LIMIT/OFFSET, and indeed it's improved the predictability of the behavior for you, just not in the direction you'd like :-( regards, tom lane
Re: Select for update with offset interferes with concurrent transactions
From
"Yngve Nysaeter Pettersen"
Date:
On Tue, 01 Feb 2011 18:18:17 +0100, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Yngve Nysaeter Pettersen" <yngve@opera.com> writes: >> To avoid having the processes trample each other's queries (the first >> attempt was to select the first matching entries of the table, which >> caused one to block all other transactions), one of the steps I took was >> to select a set of idle rows at a random offset into the table from the >> project, mark them for update, then update each record's state as >> started. > >> SELECT record_id FROM queue WHERE project_id = my_project AND state = >> idle LIMIT n OFFSET i FOR UPDATE > >> At present "n" is 100-150, "i" is a random value in the range 0-10000. > >> There is, intentionally, no ordering specified, since that would just >> slow >> down the query, and is not necessary. > > This seems like a pretty bad design. Well, I don't claim to be a database expert ;). While there might be better ways, the current one have worked OK in the year since it was implemented. > There are recognized ways to solve > this problem with more predictability and much less chance of different I'd appreciate it if you could provide a couple of pointers. > processes blocking each other. In particular, this query seems be based > on some untenable assumptions about the physical row order being stable. No, it does not assume that the row order is stable; I don't really care about the order of the elements, since the actual order of task execution depends much more significantly on other variables, and the actual order isn't important at all (although further design changes might impose some limited category grouping on the queue, that would still not make the ordering important within the group). >> What I've discovered when using Postgres 9.0 is that the processes are >> now >> blocking every other query into this table, > > In 9.0, LIMIT/OFFSET processing is done after FOR UPDATE locking, which > means that rows skipped over by OFFSET still get locked, which means > that different sessions executing this query are now practically certain > to block each other, rather than just likely to block each other. > This was an intentional change to improve the predictability of FOR > UPDATE's interactions with LIMIT/OFFSET, and indeed it's improved the > predictability of the behavior for you, just not in the direction you'd > like :-( That might be, but is is necessary to continue locking (which is what it sounds like to me) the elements that are not used in the final response past completing the query? What happens now, if I understand it correctly, is that if a "select foo from bar limit 1 order by whatever offset tablelen-1 for update" is performed, the effective operation is also "LOCK bar", not just a row lock on item tablelen-1 in that table. Was that the intention? (and yes, I am aware that ordering might be used to reverse that sequence so offset 0 can be used, but wouldn't that just as much block the query for offset 1?) -- Sincerely, Yngve N. Pettersen ******************************************************************** Senior Developer Email: yngve@opera.com Opera Software ASA http://www.opera.com/ Phone: +47 23 69 32 60 Fax: +47 23 69 24 01 ********************************************************************
Re: Select for update with offset interferes with concurrent transactions
From
"Yngve Nysaeter Pettersen"
Date:
On Tue, 01 Feb 2011 18:11:23 +0100, Andy Colson <andy@squeakycode.net> wrote: > On 2/1/2011 10:59 AM, Yngve Nysaeter Pettersen wrote: >> Hi, >> >> Thanks for the quick answer, Andy. >> >> On Tue, 01 Feb 2011 16:19:17 +0100, Andy Colson <andy@squeakycode.net> >> wrote: >> >> <snip> >>> So, if I understand correctly, you: >>> >>> q = SELECT record_id FROM queue >>> WHERE project_id = my_project AND state = idle >>> LIMIT n OFFSET i FOR UPDATE >>> while not q.eof >>> update queue set state = started where record_id = x; >>> process record_id >>> update queue set state = finsihed where record_id = x; >>> q.next; >> >> Almost, the update to "started" is done for all selected elements first, >> releasing the lock, then the items are processed one at a time, marking >> each "finished" as they complete. (each processing step can take >> minutes, so keeping a lock the whole time is not an option) >> >>> Might I suggest and alternative: >>> >>> q = update queue set state = started >>> WHERE project_id = my_project AND state = idle >>> LIMIT n OFFSET i >>> RETURNING project_id; >>> idlist = @q; >>> commit; >>> >>> foreach x in idlist >>> process record_id >>> begin >>> update queue set state = finsihed where record_id = x; >>> commit; >>> >>> Forgive the part perl part python sudocode. Oh, and I've never done >>> this, no idea if it actually works. :-) >> >> Thanks for that suggestion, I'll take a look at it. >> >> While I hadn't caught on to the "RETURNING" part, I had been wondering >> if using a single step UPDATE might be a solution. One concern I have is >> how concurrent updates will affect the returned list (or if they will >> just be skipped, as SELECT would in normal transaction mode, if I >> understood correctly), or whether it might return with an error code (I >> know that the normal update return value is the number of updated items, >> just not sure if that applies for "RETURNING"). >> >> Although, I will note that this process (if it works) will, sort of, >> make FOR UPDATE redundant. Or, if it doesn't, the current lock-policy >> might cause issues for concurrent updates for the use-cases where FOR >> UPDATE is relevant. >> > > Yeah, I'd wondered the same thing. It could be two updates hitting the > same row will deadlock, or maybe not, I'm not sure. But I think its the > same as with the select, if you happen to have two limits that hit the > same range, you're in trouble. > > I think the random limit thing is a race condition itself. Whenever you > have multiple processes hitting the same rows you're going to run into > problems. Have you thought of using a sequence instead of a random > limit? Each process could get the next 100 record_id'd via a sequence, > then there would be much less chance of deadlock. How would that work, in case you would like to provide an example? I am not really familiar with sequences, as I have only seen them used for the "id" field in Django generated tables. In case it is relevant, the processes does not (currently, at least) have a unique ID; though they have a local sequence number for the machine they are running on. -- Sincerely, Yngve N. Pettersen ******************************************************************** Senior Developer Email: yngve@opera.com Opera Software ASA http://www.opera.com/ Phone: +47 23 69 32 60 Fax: +47 23 69 24 01 ********************************************************************
On 2/1/2011 12:10 PM, Yngve Nysaeter Pettersen wrote: > On Tue, 01 Feb 2011 18:11:23 +0100, Andy Colson <andy@squeakycode.net> >> I think the random limit thing is a race condition itself. Whenever >> you have multiple processes hitting the same rows you're going to run >> into problems. Have you thought of using a sequence instead of a >> random limit? Each process could get the next 100 record_id'd via a >> sequence, then there would be much less chance of deadlock. > > How would that work, in case you would like to provide an example? > > I am not really familiar with sequences, as I have only seen them used > for the "id" field in Django generated tables. > > In case it is relevant, the processes does not (currently, at least) > have a unique ID; though they have a local sequence number for the > machine they are running on. > > I have a really simple q table I use. create table q (id integer not null, msg integer, primary key(id)); create sequence q_add; create sequence q_read; I insert via q_add: andy=# insert into q(id, msg) values(nextval('q_add'), 20); INSERT 0 1 andy=# insert into q(id, msg) values(nextval('q_add'), 4); INSERT 0 1 andy=# select * from q; id | msg ----+----- 1 | 20 2 | 4 (2 rows) Then I run multiple batch proc's which get their next job like: andy=# select msg from q where id = (select nextval('q_read')); msg ----- 20 (1 row) andy=# select msg from q where id = (select nextval('q_read')); msg ----- 4 (1 row) It works for me because I can empty the q table, reset the q_add and q_read sequences and start over clean. Not sure if it would work for your setup. -Andy
Re: Select for update with offset interferes with concurrent transactions
From
"Yngve Nysaeter Pettersen"
Date:
Thanks Andy, On Tue, 01 Feb 2011 19:29:08 +0100, Andy Colson <andy@squeakycode.net> wrote: > On 2/1/2011 12:10 PM, Yngve Nysaeter Pettersen wrote: >> On Tue, 01 Feb 2011 18:11:23 +0100, Andy Colson <andy@squeakycode.net> >>> I think the random limit thing is a race condition itself. Whenever >>> you have multiple processes hitting the same rows you're going to run >>> into problems. Have you thought of using a sequence instead of a >>> random limit? Each process could get the next 100 record_id'd via a >>> sequence, then there would be much less chance of deadlock. >> >> How would that work, in case you would like to provide an example? >> >> I am not really familiar with sequences, as I have only seen them used >> for the "id" field in Django generated tables. >> >> In case it is relevant, the processes does not (currently, at least) >> have a unique ID; though they have a local sequence number for the >> machine they are running on. >> >> > > I have a really simple q table I use. > > create table q (id integer not null, msg integer, primary key(id)); > create sequence q_add; > create sequence q_read; > > I insert via q_add: > > andy=# insert into q(id, msg) values(nextval('q_add'), 20); > INSERT 0 1 > andy=# insert into q(id, msg) values(nextval('q_add'), 4); > INSERT 0 1 > andy=# select * from q; > id | msg > ----+----- > 1 | 20 > 2 | 4 > (2 rows) > > > Then I run multiple batch proc's which get their next job like: > > andy=# select msg from q where id = (select nextval('q_read')); > msg > ----- > 20 > (1 row) > > andy=# select msg from q where id = (select nextval('q_read')); > msg > ----- > 4 > (1 row) > > > It works for me because I can empty the q table, reset the q_add and > q_read sequences and start over clean. Not sure if it would work for > your setup. I see how that would work (it is essentially how Django assigns row ids). My current setup can have multiple runs configured at a time (and have had several dozen queued, in one case), with varying priorities on each run, and they might, at least theoretically, be configured in parallel (even the individual runs are set up in parallel), meaning the ids would not be sequential (a sequence is used for the id field in each row of the table), unless they could somehow be allocated for each individual run/project (multiple sequence objects, one for each run might be an option, but I don't like that possibility). And as I mentioned elsewhere in the thread I might make the queuing a bit more complex, which might make this system even more complicated. So, AFAICT I am afraid it would not work in the general case for my project :( . However, it might be useful in somebody else's project :) . -- Sincerely, Yngve N. Pettersen ******************************************************************** Senior Developer Email: yngve@opera.com Opera Software ASA http://www.opera.com/ Phone: +47 23 69 32 60 Fax: +47 23 69 24 01 ********************************************************************
On 2/1/2011 12:51 PM, Yngve Nysaeter Pettersen wrote: > > Thanks Andy, > > On Tue, 01 Feb 2011 19:29:08 +0100, Andy Colson <andy@squeakycode.net> > wrote: > >> On 2/1/2011 12:10 PM, Yngve Nysaeter Pettersen wrote: >>> On Tue, 01 Feb 2011 18:11:23 +0100, Andy Colson <andy@squeakycode.net> >>>> I think the random limit thing is a race condition itself. Whenever >>>> you have multiple processes hitting the same rows you're going to run >>>> into problems. Have you thought of using a sequence instead of a >>>> random limit? Each process could get the next 100 record_id'd via a >>>> sequence, then there would be much less chance of deadlock. >>> >>> How would that work, in case you would like to provide an example? >>> >>> I am not really familiar with sequences, as I have only seen them used >>> for the "id" field in Django generated tables. >>> >>> In case it is relevant, the processes does not (currently, at least) >>> have a unique ID; though they have a local sequence number for the >>> machine they are running on. >>> >>> >> >> I have a really simple q table I use. >> >> create table q (id integer not null, msg integer, primary key(id)); >> create sequence q_add; >> create sequence q_read; >> >> I insert via q_add: >> >> andy=# insert into q(id, msg) values(nextval('q_add'), 20); >> INSERT 0 1 >> andy=# insert into q(id, msg) values(nextval('q_add'), 4); >> INSERT 0 1 >> andy=# select * from q; >> id | msg >> ----+----- >> 1 | 20 >> 2 | 4 >> (2 rows) >> >> >> Then I run multiple batch proc's which get their next job like: >> >> andy=# select msg from q where id = (select nextval('q_read')); >> msg >> ----- >> 20 >> (1 row) >> >> andy=# select msg from q where id = (select nextval('q_read')); >> msg >> ----- >> 4 >> (1 row) >> >> >> It works for me because I can empty the q table, reset the q_add and >> q_read sequences and start over clean. Not sure if it would work for >> your setup. > > I see how that would work (it is essentially how Django assigns row ids). > > My current setup can have multiple runs configured at a time (and have had > several dozen queued, in one case), with varying priorities on each run, > and they might, at least theoretically, be configured in parallel (even > the individual runs are set up in parallel), meaning the ids would not be > sequential (a sequence is used for the id field in each row of the table), > unless they could somehow be allocated for each individual run/project > (multiple sequence objects, one for each run might be an option, but I > don't like that possibility). And as I mentioned elsewhere in the thread I > might make the queuing a bit more complex, which might make this system > even more complicated. > > So, AFAICT I am afraid it would not work in the general case for my > project :( . > However, it might be useful in somebody else's project :) . > No, I didn't think it would work for you, yours looks much more complicated than main. Just out of curiosity, have you looked at PgQ? -Andy
Re: Select for update with offset interferes with concurrent transactions
From
Radosław Smogura
Date:
Hmm... May I ask how this look in details. If e.g. I do select * from myeshop offset 100 limit 20, I have 1000 rows which rows will be locked? a) 0 to 120, or b) all rows will be locked.? Kind regards, Radek Tom Lane <tgl@sss.pgh.pa.us> Tuesday 01 February 2011 18:18:17 > In 9.0, LIMIT/OFFSET processing is done after FOR UPDATE locking, which > means that rows skipped over by OFFSET still get locked, which means > that different sessions executing this query are now practically certain > to block each other, rather than just likely to block each other. > This was an intentional change to improve the predictability of FOR > UPDATE's interactions with LIMIT/OFFSET, and indeed it's improved the > predictability of the behavior for you, just not in the direction you'd > like :-( > > regards, tom lane
Re: Select for update with offset interferes with concurrent transactions
From
"Yngve Nysaeter Pettersen"
Date:
On Tue, 01 Feb 2011 20:04:31 +0100, Andy Colson <andy@squeakycode.net> wrote: > On 2/1/2011 12:51 PM, Yngve Nysaeter Pettersen wrote: > >> So, AFAICT I am afraid it would not work in the general case for my >> project :( . >> However, it might be useful in somebody else's project :) . >> > > No, I didn't think it would work for you, yours looks much more > complicated than main. Just out of curiosity, have you looked at PgQ? I did look around for some queuing systems a year ago, I am not sure if that one crossed my path, but didn't find any that I thought would work for me, which might just be due to the fact that I had just started with database programming (which was also the reason I chose a framework like Django for most of it; the FOR UPDATE SQL is one of less than 10 locations where I use raw SQL in my system, because Django could not provide the functionality) and I just did not realize that it could help me. Regarding PgQ, based on a quick skimming I am not sure how it would fit in my case. This may be because the tutorial leaves (IMO) a bit too much up in the air regarding how the system it is working in is organized, at least for a relative beginner as myself, and also not how a similar alternative system would look. A small complete example showing all the tables involved, the client(s), the server(s), and the operations performed, might have helped. -- Sincerely, Yngve N. Pettersen ******************************************************************** Senior Developer Email: yngve@opera.com Opera Software ASA http://www.opera.com/ Phone: +47 23 69 32 60 Fax: +47 23 69 24 01 ********************************************************************
Re: Select for update with offset interferes with concurrent transactions
From
"David Johnston"
Date:
If random sampling is desirable would the following construct limit locking only to the sampled rows? SELECT id FROM tasktable WHERE id IN (SELECT random_id_sample()) FOR UPDATE The "random_id_sample" would supply a configurable group of IDs off of tasktable which the FOR UPDATE would then lock I guess the issue remains that "random_id_sample()" would still end up blocking if any of the rows it wants to return are already locked. I too am using this basic protocol of maintaining state info within the database and sending every query against it. As I ponder this more it really seems as if moving some of this logic into the application layer would possibly make more sense in Yngve's situation (or at least something to consider). Continue to use the database as a persistence mechanism but code the "dispatching" of tasks in the application layer and then as each task is dispatched you simply do an "UPDATE table SET state = 'dispatch' WHERE id = 'ID'" and a similar UPDATE when the task is returned completed. This somewhat presumes you still only ever hand off one task at a time. If you are indeed handing off tasks in batches then it would make sense to have a "batch" table and operate at the batch level instead of individual tasks - assigning tasks to a given batch via some standard mechanism. Either way if you truly want true parallel processing then you need to create the parallel paths that can operate without clobbering each other and thus each path needs to have its own pool of tasks since as soon as you have a shared resource the only true way to make sure it is only allocated once is to serialize access to it. An alternative method would be to allow multiple dispatches but have a "write-once" method that is called and sets an immutable handler_id and then when the processing begins only the handler with the matching id would be able allow to perform the actual processing. I say the above with certainty but at the moment I am using and fairly happy with my limited serialization - especially since I have specific sub-properties that I can use to limit how many records are locked AND also because the locking time is very short (I cap around 20 or so active tasks to dispatch - and only infrequently at that) so my experience and insight to high-demand situations is limited. Dave -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, February 01, 2011 12:18 PM To: Yngve Nysaeter Pettersen Cc: pgsql-general@postgresql.org Subject: Re: Select for update with offset interferes with concurrent transactions "Yngve Nysaeter Pettersen" <yngve@opera.com> writes: > To avoid having the processes trample each other's queries (the first > attempt was to select the first matching entries of the table, which > caused one to block all other transactions), one of the steps I took > was to select a set of idle rows at a random offset into the table > from the project, mark them for update, then update each record's state as started. > SELECT record_id FROM queue WHERE project_id = my_project AND state > = idle LIMIT n OFFSET i FOR UPDATE > At present "n" is 100-150, "i" is a random value in the range 0-10000. > There is, intentionally, no ordering specified, since that would just > slow down the query, and is not necessary. This seems like a pretty bad design. There are recognized ways to solve this problem with more predictability and much less chance of different processes blocking each other. In particular, this query seems be based on some untenable assumptions about the physical row order being stable. > What I've discovered when using Postgres 9.0 is that the processes are > now blocking every other query into this table, In 9.0, LIMIT/OFFSET processing is done after FOR UPDATE locking, which means that rows skipped over by OFFSET still get locked, which means that different sessions executing this query are now practically certain to block each other, rather than just likely to block each other. This was an intentional change to improve the predictability of FOR UPDATE's interactions with LIMIT/OFFSET, and indeed it's improved the predictability of the behavior for you, just not in the direction you'd like :-( regards, tom lane
Re: Select for update with offset interferes with concurrent transactions
From
"Yngve N. Pettersen (Developer Opera Software ASA)"
Date:
Hello David, On Wed, 02 Feb 2011 01:36:15 +0100, David Johnston <polobo@yahoo.com> wrote: > If random sampling is desirable would the following construct limit > locking > only to the sampled rows? > > SELECT id > FROM tasktable > WHERE id IN (SELECT random_id_sample()) > FOR UPDATE > > The "random_id_sample" would supply a configurable group of IDs off of > tasktable which the FOR UPDATE would then lock > > I guess the issue remains that "random_id_sample()" would still end up > blocking if any of the rows it wants to return are already locked. My immediate guess is that this would work, and I might explore it once I get my new fullscale test-db up and running > I too am using this basic protocol of maintaining state info within the > database and sending every query against it. As I ponder this more it > really seems as if moving some of this logic into the application layer > would possibly make more sense in Yngve's situation (or at least > something > to consider). Continue to use the database as a persistence mechanism > but > code the "dispatching" of tasks in the application layer and then as each > task is dispatched you simply do an "UPDATE table SET state = 'dispatch' > WHERE id = 'ID'" and a similar UPDATE when the task is returned > completed. > This somewhat presumes you still only ever hand off one task at a time. > If > you are indeed handing off tasks in batches then it would make sense to > have > a "batch" table and operate at the batch level instead of individual > tasks - > assigning tasks to a given batch via some standard mechanism. If I read you correctly that is what my system does (dispatch = started, marked by the node that is to do the task). The reason I am allocating tasks in batches is that there are so many processes involved that if they pick one at a time they would block each other. With the block allocation they only need to fetch the tasks once, meaning that there are not as many requests to the queue at a time, on average. > Either way if you truly want true parallel processing then you need to > create the parallel paths that can operate without clobbering each other > and > thus each path needs to have its own pool of tasks since as soon as you > have That is what the offset part of the query was supposed to achieve. At the moment I have worked around the problem by breaking the task list into 2000 subgroups, and each process picks one at random. That limits the number of processes that get in each others way, and the measured speed is now 4-5 times what I saw on Monday, and back in the old range of performance. However, it is a hack I had hoped to avoid (and I might get rid of it with the above suggestion) > a shared resource the only true way to make sure it is only allocated > once > is to serialize access to it. An alternative method would be to allow > multiple dispatches but have a "write-once" method that is called and > sets > an immutable handler_id and then when the processing begins only the > handler > with the matching id would be able allow to perform the actual > processing. This requires the handlers to have a unique ID, which my system has not needed so far. > I say the above with certainty but at the moment I am using and fairly > happy > with my limited serialization - especially since I have specific > sub-properties that I can use to limit how many records are locked AND > also > because the locking time is very short (I cap around 20 or so active > tasks > to dispatch - and only infrequently at that) so my experience and > insight to > high-demand situations is limited. > > Dave > > > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Tuesday, February 01, 2011 12:18 PM > To: Yngve Nysaeter Pettersen > Cc: pgsql-general@postgresql.org > Subject: Re: Select for update with offset interferes with concurrent > transactions > > "Yngve Nysaeter Pettersen" <yngve@opera.com> writes: >> To avoid having the processes trample each other's queries (the first >> attempt was to select the first matching entries of the table, which >> caused one to block all other transactions), one of the steps I took >> was to select a set of idle rows at a random offset into the table >> from the project, mark them for update, then update each record's state >> as > started. > >> SELECT record_id FROM queue WHERE project_id = my_project AND state >> = idle LIMIT n OFFSET i FOR UPDATE > >> At present "n" is 100-150, "i" is a random value in the range 0-10000. > >> There is, intentionally, no ordering specified, since that would just >> slow down the query, and is not necessary. > > This seems like a pretty bad design. There are recognized ways to solve > this problem with more predictability and much less chance of different > processes blocking each other. In particular, this query seems be based > on > some untenable assumptions about the physical row order being stable. > >> What I've discovered when using Postgres 9.0 is that the processes are >> now blocking every other query into this table, > > In 9.0, LIMIT/OFFSET processing is done after FOR UPDATE locking, which > means that rows skipped over by OFFSET still get locked, which means that > different sessions executing this query are now practically certain to > block > each other, rather than just likely to block each other. > This was an intentional change to improve the predictability of FOR > UPDATE's > interactions with LIMIT/OFFSET, and indeed it's improved the > predictability > of the behavior for you, just not in the direction you'd like :-( > > regards, tom lane > > -- Sincerely, Yngve N. Pettersen ******************************************************************** Senior Developer Email: yngve@opera.com Opera Software ASA http://www.opera.com/ Phone: +47 23 69 32 60 Fax: +47 23 69 24 01 ********************************************************************
Re: Select for update with offset interferes with concurrent transactions
From
"Yngve N. Pettersen (Developer Opera Software ASA)"
Date:
Hello all, Just a quick update of how it went. I ended up using code similar to a combination of Andy Colson's and David Johnston's suggestions below, and performance is back at what is was before. Thanks for the suggestions BTW: AFAICT I never got a response from Tom Lane about whether it was the intention with the new FOR UPDATE locking policy to effectively lock the entire table for all other clients using the exact same Select but with a different and non-overlapping offset/limit for update query. IMO continuing to lock unselected rows after the selection have completed is a significant performance regression. Also, an off-topic BTW: I have noticed that autovacuum of a table seems to block ANALYZE of the same table, because the autovacuum do not release its lock on the table. On Tue, 01 Feb 2011 16:19:17 +0100, Andy Colson <andy@squeakycode.net> wrote: > On 2/1/2011 6:32 AM, Yngve Nysaeter Pettersen wrote: >> Hello all, >> >> I am in the process of migrating a system from Postgresql 8.3 to 9.0, >> and have run into a problem with the task queue systems I am using. >> >> The task queue controls the allocation of tasks between about 1000 >> processes working in parallel, and is essentially a table of > <snip> > > So, if I understand correctly, you: > > q = SELECT record_id FROM queue > WHERE project_id = my_project AND state = idle > LIMIT n OFFSET i FOR UPDATE > while not q.eof > update queue set state = started where record_id = x; > process record_id > update queue set state = finsihed where record_id = x; > q.next; > > > Might I suggest and alternative: > > q = update queue set state = started > WHERE project_id = my_project AND state = idle > LIMIT n OFFSET i > RETURNING project_id; > idlist = @q; > commit; > > foreach x in idlist > process record_id > begin > update queue set state = finsihed where record_id = x; > commit; > On Wed, 02 Feb 2011 01:36:15 +0100, David Johnston <polobo@yahoo.com> wrote: > If random sampling is desirable would the following construct limit > locking > only to the sampled rows? > > SELECT id > FROM tasktable > WHERE id IN (SELECT random_id_sample()) > FOR UPDATE > -- Sincerely, Yngve N. Pettersen ******************************************************************** Senior Developer Email: yngve@opera.com Opera Software ASA http://www.opera.com/ Phone: +47 23 69 32 60 Fax: +47 23 69 24 01 ********************************************************************
On 3/14/2011 10:13 AM, Yngve N. Pettersen (Developer Opera Software ASA) wrote: > Hello all, > > Just a quick update of how it went. > > I ended up using code similar to a combination of Andy Colson's and David > Johnston's suggestions below, and performance is back at what is was > before. Thanks for the suggestions > Sweet, thanks for the update... I always like to hear how things turn out. -Andy
On Tue, Feb 1, 2011 at 11:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Yngve Nysaeter Pettersen" <yngve@opera.com> writes: >> To avoid having the processes trample each other's queries (the first >> attempt was to select the first matching entries of the table, which >> caused one to block all other transactions), one of the steps I took was >> to select a set of idle rows at a random offset into the table from the >> project, mark them for update, then update each record's state as started. > >> SELECT record_id FROM queue WHERE project_id = my_project AND state = >> idle LIMIT n OFFSET i FOR UPDATE > >> At present "n" is 100-150, "i" is a random value in the range 0-10000. > >> There is, intentionally, no ordering specified, since that would just slow >> down the query, and is not necessary. > > This seems like a pretty bad design. There are recognized ways to solve > this problem with more predictability and much less chance of different > processes blocking each other. In particular, this query seems be based > on some untenable assumptions about the physical row order being stable. > >> What I've discovered when using Postgres 9.0 is that the processes are now >> blocking every other query into this table, > > In 9.0, LIMIT/OFFSET processing is done after FOR UPDATE locking, which > means that rows skipped over by OFFSET still get locked, which means > that different sessions executing this query are now practically certain > to block each other, rather than just likely to block each other. > This was an intentional change to improve the predictability of FOR > UPDATE's interactions with LIMIT/OFFSET, and indeed it's improved the > predictability of the behavior for you, just not in the direction you'd > like :-( You can get something approximating the old behavior with a CTE: with q as (select id from foo where <something> limit x offset y) select * from foo join q using(id) order by id for update; not that this is a good idea -- it isn't -- but if you must do it that way, the above might work. CTE are always a something to consider when dealing with order of execution problems which seem to be burning just about everyone these days. merlin