Re: PostgreSQL question re select for update (locking??) - Mailing list pgsql-general

From Richard Huxton
Subject Re: PostgreSQL question re select for update (locking??)
Date
Msg-id 467A9668.20701@archonet.com
Whole thread Raw
In response to PostgreSQL question re select for update (locking??)  (Reid Thompson <Reid.Thompson@ateb.com>)
List pgsql-general
Reid Thompson wrote:
>
> Each server process claims a jobq record by selecting for update a
> jobq record where the pid column is null, then rewrites the record with
> the pid set in the pid column.
>
> The "distilled" sql select statement is:
>
>     * SELECT J.*, C.name, C.client_id, C.priority
>     * FROM jobq J, campaign C
>     * WHERE J.pid IS NULL
>     * AND 'my_YYYYMMDDhhmmss'>=J.due_date
>     * AND J.campaign_id=C.id
>     * ORDER BY C.priority,J.due_date,J.item_id
>     * LIMIT 1 FOR UPDATE;
>
>
> What we are seeing is:
>
> - We dump hundreds of records into the jobq table, expecting that a
>   server process will each claim a record (and process it), but some of
>   the processes call the claim function but get nothing returned by the
>   select; they sleep for a short time (eg 10 seconds - this sleep
>   differs on a per process basis) then attempt to claim again.
>   Sometimes this claim returns a jobq record, other times it may take 4
>   or 5 attempted claims before a record is actually returned by the
>   select.
>
>   Our expectation was that with hundreds of selectable records available,
>   that each server process would 'get' the next available record --
>   Is this a false expectation on our part?
>   Is the currently 'active' select for update limit 1 blocking the others?
>   If not, can someone give us some direction..??

The problem is that the order of events isn't quite what you expect.

You can think of the SELECT ... LIMIT 1 as identifying a set containing
one row. Since it's FOR UPDATE, the backend tries to get a lock, and if
necessary waits until it can. Of course, all the other processors are
doing the same thing.

When another process updates that row, the backend re-checks the
conditions and you end up with a set of zero rows, which it helpfully
gains a lock on and then returns to you!

So - what to do? Well, there are several options.

1. Keep the lock/update period short by just marking the row as
"claimed" and committing. You still need to loop trying to get a job to
process, but the process is smoother because the possible window for
conflicts is short.

2. Select "random" jobs from the list if you don't care about order.
Again, you're just making things smoother, not removing the requirement
to loop checking.

3. Have a "master" process that allocates jobs to processors. Or
simulate such a thing by having processors call an update_jobs()
function. That way you can pre-allocate jobs to processors (at perhaps a
cost in efficiency if the number of jobs in-queue falls below the number
of processors).

--
   Richard Huxton
   Archonet Ltd

pgsql-general by date:

Previous
From: Csaba Nagy
Date:
Subject: Re: Excell
Next
From: Robert Treat
Date:
Subject: Re: postgresql and solaris 10: pitch to sysadmins