Thread: Workqueue performance

Workqueue performance

From
Jason Armstrong
Date:
I have a  performance question with a database workqueue I am working on.

I have two tables, the first containing information about files, and
the second is a workqueue table with 'jobs', intended for a worker
process to transfer the files to another machine:

> create table log.file (id uuid, created timestamp default NOW() not null, filetype_id smallint, process_id smallint,
filepathtext, UNIQUE (id, filetype_id)); 
> create table fileworkqueue.job (id uuid, filetype_id smallint, filepath text, attempt smallint not null default 0);
> create index fwq_id_filetype_id_idx on fileworkqueue.job(id, filetype_id);

The log.file table is populated by an application which is writing
files to the filesystem.
The fileworkqueue.job table is populated with a trigger on log.file:

CREATE OR REPLACE FUNCTION log.fileworkqueue_add() RETURNS TRIGGER AS
$$
BEGIN
  INSERT INTO fileworkqueue.job(id, filetype_id, filepath)
    VALUES(NEW.id, NEW.filetype_id, NEW.filepath);

  IF NEW.filetype_id IN (1, 2, 3, 4) THEN
    NOTIFY worker1;
  ELSIF NEW.filetype_id IN (5, 6, 7, 8) THEN
    NOTIFY worker2;
  ELSE
    RAISE EXCEPTION 'Filetype ID not known %', NEW.filetype_id;
  END IF;

  RETURN NULL;
END;
$$ LANGUAGE plpgsql;


My worker processes then 'LISTEN' for the appropriate NOTIFY, select
the rows from the fileworkqueue.job table according to the
'filetype_id', and transfer them. After processing, it deletes the row
from the workqueue.

When we are processing without the workers running (ie just insert
into the log.file table, with the fileworkqueue.job table being filled
up by the trigger), we see a rate of about 3 milliseconds per insert.
When it is run with the workers removing data from the
fileworkqueue.job table, this drops to below 50 Ms.

I have tried various options, such as removing the index on the
workqueue, updating the workqueue table with a flag to indicate
'complete' instead of deleting the row, but I am unable to get better
performance.

Does anyone have any thoughts on how to do this more optimally? I
don't want the workqueue to impact the process adding the data too
much. I am thinking of creating a third table, which will just contain
the 'processed' info (id and filetype_id), so that the workers will
write to this table when the job is complete, and select which data to
process by a view which shows just the entries which are in the
workqueue table. But then I will have to deal with cleaning all these
entries up at some point, which could get complicated in itself.

Thanks for any ideas.

--
Jason Armstrong

Re: Workqueue performance

From
Andy Colson
Date:
On 5/11/2010 3:28 AM, Jason Armstrong wrote:
> I have a  performance question with a database workqueue I am working on.
>
> I have two tables, the first containing information about files, and
> the second is a workqueue table with 'jobs', intended for a worker
> process to transfer the files to another machine:
>
>> create table log.file (id uuid, created timestamp default NOW() not null, filetype_id smallint, process_id smallint,
filepathtext, UNIQUE (id, filetype_id)); 
>> create table fileworkqueue.job (id uuid, filetype_id smallint, filepath text, attempt smallint not null default 0);
>> create index fwq_id_filetype_id_idx on fileworkqueue.job(id, filetype_id);
>
>
> When we are processing without the workers running (ie just insert
> into the log.file table, with the fileworkqueue.job table being filled
> up by the trigger), we see a rate of about 3 milliseconds per insert.
> When it is run with the workers removing data from the
> fileworkqueue.job table, this drops to below 50 Ms.
>
>
> Thanks for any ideas.
>

Does the worker keep a transaction open for a long period of time?

ie. worker:
startTransaction
select jobs
process job for a while
delete from job where...
commit

If so, that might be a problem.

-Andy

Re: Workqueue performance

From
Tom Lane
Date:
Jason Armstrong <ja@riverdrums.com> writes:
> My worker processes then 'LISTEN' for the appropriate NOTIFY, select
> the rows from the fileworkqueue.job table according to the
> 'filetype_id', and transfer them. After processing, it deletes the row
> from the workqueue.

> When we are processing without the workers running (ie just insert
> into the log.file table, with the fileworkqueue.job table being filled
> up by the trigger), we see a rate of about 3 milliseconds per insert.
> When it is run with the workers removing data from the
> fileworkqueue.job table, this drops to below 50 Ms.

Not sure if this is the source of your issue, but have you checked how
many dead rows are in pg_listener?  Applications that are making heavy
use of NOTIFY/LISTEN tend to need *very* aggressive vacuuming of that
table in order to keep performance up.

(FWIW, 9.0 will have a rewritten notify mechanism that eliminates this
problem.  Doesn't help you today though.)

            regards, tom lane

Re: Workqueue performance

From
Jason Armstrong
Date:
Thank-you for the tips about this issue.

I found two things:

1. NOTIFY/LISTEN was causing performance to degrade badly over time. I
have reworked my code to poll the database instead.

2. There was a further function trigger that I didn't include in the
description. I wanted to keep a status of the jobs in the workqueue. I
created two triggers on the job table, that would update a 'status'
table:

> create table fileworkqueue.status(filetype_id smallint not null, num integer not null default 0 check (num >= 0));

> CREATE FUNCTION fileworkqueue.add_status() RETURNS TRIGGER AS
$$
BEGIN
  UPDATE fileworkqueue.status SET num=num+1 WHERE filetype_id=NEW.filetype_id;
  return NULL;
END;
$$ LANGUAGE plpgsql;

> CREATE FUNCTION fileworkqueue.del_status() RETURNS TRIGGER AS
$$
BEGIN
  UPDATE fileworkqueue.status SET num=num-1 WHERE filetype_id=OLD.filetype_id;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER fileworkqueue_add_trigger
  AFTER INSERT ON fileworkqueue.job
  FOR EACH ROW EXECUTE PROCEDURE fileworkqueue.add_status();

CREATE TRIGGER fileworkqueue_del_trigger
  AFTER DELETE ON fileworkqueue.job
  FOR EACH ROW EXECUTE PROCEDURE fileworkqueue.del_status();

So there were actually two triggers in the original design:

log.file -> fileworkqueue.job -> fileworkqueue.status

When I removed the second trigger to the 'status' table, performance
jumped tenfold. But now I had no means of monitoring how my workqueue
was performing. I decided to do this in application code instead, via
IPC.

--
Jason Armstrong

Re: Workqueue performance

From
Tom Lane
Date:
Jason Armstrong <ja@riverdrums.com> writes:
> 1. NOTIFY/LISTEN was causing performance to degrade badly over time. I
> have reworked my code to poll the database instead.

FWIW, you need to ensure pg_listener gets vacuumed pretty aggressively
in order to prevent degradation in a high-traffic NOTIFY application.

PG 9.0 will have a completely rewritten LISTEN/NOTIFY implementation
that avoids use of a table and should scale a lot better, as well as not
needing vacuuming support.  That doesn't help you right now, but
depending on what your development timescale is, you might want to plan
to go back to LISTEN/NOTIFY later.

            regards, tom lane