Re: Serialization, Locking...implement processing Queue with a table - Mailing list pgsql-general

From Tom Lane
Subject Re: Serialization, Locking...implement processing Queue with a table
Date
Msg-id 8777.1052749406@sss.pgh.pa.us
Whole thread Raw
In response to Serialization, Locking...implement processing Queue with a table  ("D. Dante Lorenso" <dante@lorenso.com>)
List pgsql-general
"D. Dante Lorenso" <dante@lorenso.com> writes:
> How should I go about implementing a synchronized process id
> queue that will select one unique row from a table at a time
> and make the selection be safe with concurrent accesses?

You can find various discussions of this in the archives, but a
reasonable way to proceed is:

1. The table of pending or in-process jobs has a column "processor_id"
that is zero for pending jobs and equal to the (unique) processor number
for active jobs.  (Assume for the moment that completed jobs are removed
from the table entirely.)

2. When idle, you try to reserve a job like so:

    BEGIN;
    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
    SELECT job_id, ... FROM job_table
        WHERE processor_id = 0 LIMIT 1 FOR UPDATE;

The SELECT has three possible outcomes:

2a: One row is returned.  You do

    UPDATE job_table SET processor_id = $me
        WHERE job_id = $jobid;
    COMMIT;

and then go about executing the job.  When done, delete the row from
job_table and try to get another one.

2b: No row is returned: no jobs are pending.  Commit your transaction,
sleep for an appropriate delay period, and try again.

2c: You get a "can't serialize" failure.  This will happen if two
processors try to reserve the same row at the same time.  In this case,
roll back your transaction, sleep for a short interval (maybe a few
msec) and try again.  You don't want to sleep as long as normal in this
case, since there might be another available job.

(Note that you *cannot* do this in a plpgsql function, since it cannot
start or commit a transaction; these commands have got to be directly
issued by the application.)


Assuming that there aren't a vast number of pending jobs at any time,
this should work pretty well without even bothering with an index on
job_table.  You will want to vacuum it often though (at least every few
hundred job completions, I'd think).

Now, what if you wanted to remember completed jobs?  I'd actually
recommend transferring the records of completed jobs to a different
table.  But if you really want to keep them in the same table, maybe
add a boolean "completed" field, and make the initial SELECT be

    SELECT job_id, ... FROM job_table
        WHERE processor_id = 0 AND NOT completed LIMIT 1 FOR UPDATE;

Now you *will* need an index to keep things speedy.  I'd try a partial
index on processor_id with condition "NOT completed".  You'll still
need frequent vacuums.

            regards, tom lane


pgsql-general by date:

Previous
From: Jason Hihn
Date:
Subject: Arrays with MS Access?
Next
From: Jon Earle
Date:
Subject: Re: Error installing postgresql-7.3.2 (fixed, but Q