On 05/08/2014 04:09 AM, Vincent de Phily wrote:
> Hello,
>
> I'm processing records in batches using PG 9.1.12, python 2.7, and psycopg
> 2.5.2 :
Comments in the code below:
>
> def enqueue_loop(q):
> curs = DB_HANDLER.cursor()
> query = """UPDATE foo SET processing = 't' WHERE id IN
> (SELECT id FROM foo WHERE processing = 'f' ORDER BY id ASC LIMIT
> 5000 FOR UPDATE) RETURNING *"""
# Where is this query actually run?
> while DO_LOOP: #the whole program eventually stops if this is false
# What cause DO_LOOP to go false?
> results = curs.fetchall()
> rlen = len(results)
> if rlen > 0:
> LOG.debug("Fetched %d rows", rlen)
# What do you see in LOG for rlen values?
> if rlen == 0:
> # [...] wait for notification...
> continue
> # [...] Enqueue batch and let other threads process it.
> # [...] Those threads will eventually delete the processed rows from
> # the foo table.
>
>
> The problem is that sometimes (once every few days at about 2-300K queries per
> day) I get many more rows than the max 5000 I asked for (I've seen up to 25k).
> And I'm getting timeouts and other problems as a result.
>
> The id column is your typical primary key integer with a unique index. I've
> checked the problematic cases and there are no id gaps or duplicate rows.
> There are multiple threads in the program, but only the main thread is running
> enqueue_loop(). I'm not sure if this is a server or a driver issue.
>
>
> Any idea ? Thanks.
>
>
--
Adrian Klaver
adrian.klaver@aklaver.com