Re: Select for update with offset interferes with concurrent transactions - Mailing list pgsql-general

From Andy Colson
Subject Re: Select for update with offset interferes with concurrent transactions
Date
Msg-id 4D482475.6050408@squeakycode.net
Whole thread Raw
In response to Select for update with offset interferes with concurrent transactions  ("Yngve Nysaeter Pettersen" <yngve@opera.com>)
Responses Re: Select for update with offset interferes with concurrent transactions  ("Yngve Nysaeter Pettersen" <yngve@opera.com>)
Re: Select for update with offset interferes with concurrent transactions  ("Yngve N. Pettersen (Developer Opera Software ASA)" <yngve@opera.com>)
List pgsql-general
On 2/1/2011 6:32 AM, Yngve Nysaeter Pettersen wrote:
> Hello all,
>
> I am in the process of migrating a system from Postgresql 8.3 to 9.0,
> and have run into a problem with the task queue systems I am using.
>
> The task queue controls the allocation of tasks between about 1000
> processes working in parallel, and is essentially a table of
>
> record_id (unique)
> project_id
> task_description_id
> state (idle, started, finished)
>
> Each project currently have about 2 million entries. My plan is to
> increase that significantly the next few months.
>
> To avoid having the processes trample each other's queries (the first
> attempt was to select the first matching entries of the table, which
> caused one to block all other transactions), one of the steps I took was
> to select a set of idle rows at a random offset into the table from the
> project, mark them for update, then update each record's state as started.
>
> SELECT record_id FROM queue WHERE project_id = my_project AND state =
> idle LIMIT n OFFSET i FOR UPDATE
>
> At present "n" is 100-150, "i" is a random value in the range 0-10000.
>
> There is, intentionally, no ordering specified, since that would just
> slow down the query, and is not necessary.
>
> For reference, the above query is sent through Django's cursor.execute()
> call in a manual transaction block.
>
>
>
> What I've discovered when using Postgres 9.0 is that the processes are
> now blocking every other query into this table, apparently reducing the
> task processing speed by at least a factor of 10, and increasing the
> load on the server by a similar factor, compared to when Postgres 8.3
> was used. The problem is apparent just after starting, with only 50-100
> processes active (startup is staggered).
>
> Reducing "n" (and looping), or increasing the "i" range did not work.
>
>
> The reason seems to be this new part of
> http://www.postgresql.org/docs/9.0/static/sql-select.html (towards the
> end of the FOR UPDATE section):
>
> If a LIMIT is used, locking stops once enough rows have been returned to
> satisfy the limit
> (but note that rows skipped over by OFFSET will get locked). Similarly,
> if FOR UPDATE or
> FOR SHARE is used in a cursor's query, only rows actually fetched or
> stepped past by the
> cursor will be locked.
>
> I can't find similar text in the 8.3 or 8.4 documentation.
>
> AFAICT, and assuming I have not misunderstood this part of the
> documentation this means that if one of my processing nodes selects a
> block of 100 entries at offset 8000 in the resulting table, then every
> other node will be blocked while the block is being processed, not just
> the nodes that would have selected the rows in the range 0 to 7999, but
> also >=8100, because they cannot gain access to the rows.
>
> Also, using FOR SHARE does not seem to solve the problem.
>
> IMO, as a database non-expert, locking rows that were not returned as a
> result of the query is a bug. As an example, if a query selects the X
> last items in the matching rows, that is equivalent to locking the
> table, or the relevant part of it, even if the requester have no
> intention to modify those other rows.
>
>
> Is there any way to avoid this problem? Or do I have to add a random
> batch_id field to the queue table in order to separate the processes'
> queries so that they do not block each other (as frequently)?
>
> Is it possible to disable the source code causing this (that is,
> reverting the patch that introduced the problem, or changing a
> configuration switch)?
>
>

So, if I understand correctly, you:

q = SELECT record_id FROM queue
    WHERE project_id = my_project AND state = idle
    LIMIT n OFFSET i FOR UPDATE
while not q.eof
    update queue set state = started where record_id = x;
    process record_id
    update queue set state = finsihed where record_id = x;
    q.next;


Might I suggest and alternative:

q = update queue set state = started
    WHERE project_id = my_project AND state = idle
    LIMIT n OFFSET i
    RETURNING project_id;
idlist = @q;
commit;

foreach x in idlist
    process record_id
    begin
    update queue set state = finsihed where record_id = x;
    commit;



Forgive the part perl part python sudocode.  Oh, and I've never done
this, no idea if it actually works. :-)

-Andy

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: cast problem in Postgresql 9.0.1
Next
From: Dario Beraldi
Date:
Subject: Cluster table and order information