Thread: queueing via database table?

queueing via database table?

From
Mark Harrison
Date:
I have a cluster of CPUs generating thumbnails for
a render farm.  I would like to place thumbnail
requests on a queue, and have the cluster of client
dequeue the requests and process them.

Of course, each request should be only dequeued once...
if a thumbnail is being processed by one CPU, it
shouldn't be processed by another CPU.

Does the following sound like a reasonable approach?
If not, what's a good way to go?

The processes generating the requests will insert into
a queue table.  They may add a priority and timestamp.

The several processes servicing the requests will do a
SELECT FOR UPDATE where ... limit 1, generate thumbnail,
delete the record and commit.

Comments and suggestions welcome,
Mark

--
Mark Harrison
Pixar Animation Studios

Re: queueing via database table?

From
Steve Atkins
Date:
On Jan 2, 2007, at 10:34 PM, Mark Harrison wrote:

> I have a cluster of CPUs generating thumbnails for
> a render farm.  I would like to place thumbnail
> requests on a queue, and have the cluster of client
> dequeue the requests and process them.
>
> Of course, each request should be only dequeued once...
> if a thumbnail is being processed by one CPU, it
> shouldn't be processed by another CPU.
>
> Does the following sound like a reasonable approach?
> If not, what's a good way to go?
>
> The processes generating the requests will insert into
> a queue table.  They may add a priority and timestamp.
>
> The several processes servicing the requests will do a
> SELECT FOR UPDATE where ... limit 1, generate thumbnail,
> delete the record and commit.
>
> Comments and suggestions welcome,

Holding a lock while generating the thumbnail doesn't
sound like a great idea, and I think that the select
for update will end up serialising the requests.

I'd add a "rendering" field, text, defaulting
to an empty string.

Then do a "select for update where ... and rendering = '' limit 1",
update the rendering field to the hostname of the box doing the
work and commit. Render the thumbnail. Delete the record.

That'll also give you an easy way to show status of which
box is rendering which scene.

Depending on what else you're putting into the where clause
a partial index on something for records where rendering=''
might be helpful.

Cheers,
   Steve


Re: queueing via database table?

From
Richard Huxton
Date:
Steve Atkins wrote:
>
> Holding a lock while generating the thumbnail doesn't
> sound like a great idea, and I think that the select
> for update will end up serialising the requests.
>
> I'd add a "rendering" field, text, defaulting
> to an empty string.
>
> Then do a "select for update where ... and rendering = '' limit 1",
> update the rendering field to the hostname of the box doing the
> work and commit. Render the thumbnail. Delete the record.

Assuming each processing host keeps its connection open, I'd store the
process-id instead (get via function pg_backend_pid()). Also have a
separate status (pending|processing|done) and timestamps to track when
each status is set. Only delete rows that have been marked "done" for a
certain length of time.

This will let you spot when a host has stopped processing (e.g. crashed)
and also let you measure throughput on particular hosts.

The other thing to be aware of is that queries of the SELECT FOR UPDATE
LIMIT 1 form can return 0 rows. Open two psql connections and try it to
see what happens. You'll need to have the processing clients retry the
query in this case.

HTH
--
   Richard Huxton
   Archonet Ltd

Re: queueing via database table?

From
"Dawid Kuroczko"
Date:
On 1/3/07, Mark Harrison <mh@pixar.com> wrote:
> Does the following sound like a reasonable approach?
> If not, what's a good way to go?
>
> The processes generating the requests will insert into
> a queue table.  They may add a priority and timestamp.
>
> The several processes servicing the requests will do a
> SELECT FOR UPDATE where ... limit 1, generate thumbnail,
> delete the record and commit.

Well, this will block.  So it will mean that only one thumbnail
will be processed while running the transaction.

You may want to rather use SELECT FOR UPDATE NOWAIT,
probably "wrapped" into a PL/PgSQL function.  I did that and
I'm quite satisfied with this approach.

A simple implementation would be something like this:

CREATE OR REPLACE FUNCTION get_next() RETURNS int AS $$
  DECLARE
   r RECORD;
  BEGIN
    FOR r IN SELECT id FROM foo_table LIMIT 100 LOOP
       BEGIN
         PERFORM id FROM foo_table WHERE id=r.id FOR UPDATE NOWAIT;
         RETURN r.id;
       EXCEPTION
         WHEN lock_not_available THEN -- do nothing
       END;
    END LOOP;
    RETURN NULL;
  END;
$$ LANGUAGE PLpgSQL;

Of course you should customize the query, and use better tuned limit.
I think good rule of the thumb size of LIMIT is twice the number of
simultaneous processing nodes working.  An ORDER BY might be
worh it or not, etc, etc.

Other approach might be using something like
  LOOP
    BEGIN
      SELECT id INTO i FROM foo_table LIMIT 1 OFFSET n FOR UPDATE NOWAIT;
      RETURN i;
    EXCEPTION
      WHEN lock_not_avaibale THEN -- do nothing;
    END;
    n := n + 1;
  END LOOP;

But I feel it will be slower most of the time.

Re: queueing via database table?

From
"Gregory S. Williamson"
Date:
Mark --

As others have indicated, there may be some blocking issues with the approach you outlined.

A variant I have seen used in the past uses a table with a unique id for the job, the work queue it is in, a status
flag,priority and at least one time stamp (and perhaps space for a process id). 

Each client that wants work issues a request (SELECT FOR UPDATE) to get the next job in its queue that has a status
flagof "Available" ordered by priority or initial time of creation, etc.; update that entry with the current timestamp
(andperhaps the process id of the client) and set the status flag to show the job is now being worked on all in one
transaction.

This releases the job but now with a changed status flag so other processes pulling work from the same queue won't see
itanymore. 

When the job finishes it selects its entry and updates the status flag and timestamp (and probably clears its process
id).Logic for how to bump a job to the next step can be embedded in the client or in another process, depending on your
needs.

It is useful to have a daemon or some other process to sweep the queue table and at least send an alert about stale or
frozenjobs. 

HTH,

Greg Williamson
DBA
GlobeXplorer LLC
-----Original Message-----
From:    pgsql-general-owner@postgresql.org on behalf of Mark Harrison
Sent:    Tue 1/2/2007 10:34 PM
To:    pgsql-general@postgresql.org
Cc:
Subject:    [GENERAL] queueing via database table?

I have a cluster of CPUs generating thumbnails for
a render farm.  I would like to place thumbnail
requests on a queue, and have the cluster of client
dequeue the requests and process them.

Of course, each request should be only dequeued once...
if a thumbnail is being processed by one CPU, it
shouldn't be processed by another CPU.

Does the following sound like a reasonable approach?
If not, what's a good way to go?

The processes generating the requests will insert into
a queue table.  They may add a priority and timestamp.

The several processes servicing the requests will do a
SELECT FOR UPDATE where ... limit 1, generate thumbnail,
delete the record and commit.

Comments and suggestions welcome,
Mark

--
Mark Harrison
Pixar Animation Studios

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@postgresql.org so that your
       message can get through to the mailing list cleanly


-------------------------------------------------------
Click link below if it is SPAM gsw@globexplorer.com

"https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=459b5025191744846743324&user=gsw@globexplorer.com&retrain=spam&template=history&history_page=1"
!DSPAM:459b5025191744846743324!
-------------------------------------------------------






Re: queueing via database table?

From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/03/07 00:34, Mark Harrison wrote:
> I have a cluster of CPUs generating thumbnails for
> a render farm.  I would like to place thumbnail
> requests on a queue, and have the cluster of client
> dequeue the requests and process them.
>
> Of course, each request should be only dequeued once...
> if a thumbnail is being processed by one CPU, it
> shouldn't be processed by another CPU.
>
> Does the following sound like a reasonable approach?
> If not, what's a good way to go?
>
> The processes generating the requests will insert into
> a queue table.  They may add a priority and timestamp.
>
> The several processes servicing the requests will do a
> SELECT FOR UPDATE where ... limit 1, generate thumbnail,
> delete the record and commit.
>
> Comments and suggestions welcome,

That's not what relational tables are good at.  Instead, use a
message queuing library with a file backing-store (so that if the
machine goes down for any reason, the messages are still in the queue).

In a message-passing system, a network-aware daemon manages a set of
named FIFO queues.  Some processes call in_q(), and other processes
(same machine, or not) call de_q().  If nothing is calling de_q(),
messages just pile up in the queue until such time as something
*does* start calling de_q().


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFm504S9HxQb37XmcRAmj6AKDWa7Sx15wygoTc+/wOfLZIpqi4awCg34SZ
rkq1IEjdqu1zx0B5QyFW/n0=
=V0oF
-----END PGP SIGNATURE-----

Re: queueing via database table?

From
Scott Marlowe
Date:
On Tue, 2007-01-02 at 22:34 -0800, Mark Harrison wrote:
> I have a cluster of CPUs generating thumbnails for
> a render farm.  I would like to place thumbnail
> requests on a queue, and have the cluster of client
> dequeue the requests and process them.
>
> Of course, each request should be only dequeued once...
> if a thumbnail is being processed by one CPU, it
> shouldn't be processed by another CPU.
>
> Does the following sound like a reasonable approach?
> If not, what's a good way to go?
>
> The processes generating the requests will insert into
> a queue table.  They may add a priority and timestamp.
>
> The several processes servicing the requests will do a
> SELECT FOR UPDATE where ... limit 1, generate thumbnail,
> delete the record and commit.

Here's what I'd do.  Create two sequences.  Sequence one is used to
assign ids to the thumbnail records when they're placed into the control
table.  The other is used to "check out" the records.

A process selects nextval from sequence two, and then selects the
corresponding record info from the control table, and marks the record
as being in work.  When it's done, it marks it as done, and selects
another value from the second sequence and repeats the process.

Every now and then run a check program to look for thumbs that have been
missed or skipped and process them or assign them a new id from sequence
one to put them back into the queue.

Re: queueing via database table?

From
Vivek Khera
Date:
On Jan 3, 2007, at 2:00 AM, Steve Atkins wrote:

> Holding a lock while generating the thumbnail doesn't
> sound like a great idea, and I think that the select
> for update will end up serialising the requests.
>
> I'd add a "rendering" field, text, defaulting
> to an empty string.
>
> Then do a "select for update where ... and rendering = '' limit 1",
> update the rendering field to the hostname of the box doing the
> work and commit. Render the thumbnail. Delete the record.
>
> That'll also give you an easy way to show status of which
> box is rendering which scene.
>
> Depending on what else you're putting into the where clause
> a partial index on something for records where rendering=''
> might be helpful.

this is more or less how we do it, so i second this.

we also use NOTIFY/LISTEN to "wake up" the job processors when new
work is added.


Attachment