Thread: Workqueue performance
I have a performance question with a database workqueue I am working on. I have two tables, the first containing information about files, and the second is a workqueue table with 'jobs', intended for a worker process to transfer the files to another machine: > create table log.file (id uuid, created timestamp default NOW() not null, filetype_id smallint, process_id smallint, filepathtext, UNIQUE (id, filetype_id)); > create table fileworkqueue.job (id uuid, filetype_id smallint, filepath text, attempt smallint not null default 0); > create index fwq_id_filetype_id_idx on fileworkqueue.job(id, filetype_id); The log.file table is populated by an application which is writing files to the filesystem. The fileworkqueue.job table is populated with a trigger on log.file: CREATE OR REPLACE FUNCTION log.fileworkqueue_add() RETURNS TRIGGER AS $$ BEGIN INSERT INTO fileworkqueue.job(id, filetype_id, filepath) VALUES(NEW.id, NEW.filetype_id, NEW.filepath); IF NEW.filetype_id IN (1, 2, 3, 4) THEN NOTIFY worker1; ELSIF NEW.filetype_id IN (5, 6, 7, 8) THEN NOTIFY worker2; ELSE RAISE EXCEPTION 'Filetype ID not known %', NEW.filetype_id; END IF; RETURN NULL; END; $$ LANGUAGE plpgsql; My worker processes then 'LISTEN' for the appropriate NOTIFY, select the rows from the fileworkqueue.job table according to the 'filetype_id', and transfer them. After processing, it deletes the row from the workqueue. When we are processing without the workers running (ie just insert into the log.file table, with the fileworkqueue.job table being filled up by the trigger), we see a rate of about 3 milliseconds per insert. When it is run with the workers removing data from the fileworkqueue.job table, this drops to below 50 Ms. I have tried various options, such as removing the index on the workqueue, updating the workqueue table with a flag to indicate 'complete' instead of deleting the row, but I am unable to get better performance. Does anyone have any thoughts on how to do this more optimally? I don't want the workqueue to impact the process adding the data too much. I am thinking of creating a third table, which will just contain the 'processed' info (id and filetype_id), so that the workers will write to this table when the job is complete, and select which data to process by a view which shows just the entries which are in the workqueue table. But then I will have to deal with cleaning all these entries up at some point, which could get complicated in itself. Thanks for any ideas. -- Jason Armstrong
On 5/11/2010 3:28 AM, Jason Armstrong wrote: > I have a performance question with a database workqueue I am working on. > > I have two tables, the first containing information about files, and > the second is a workqueue table with 'jobs', intended for a worker > process to transfer the files to another machine: > >> create table log.file (id uuid, created timestamp default NOW() not null, filetype_id smallint, process_id smallint, filepathtext, UNIQUE (id, filetype_id)); >> create table fileworkqueue.job (id uuid, filetype_id smallint, filepath text, attempt smallint not null default 0); >> create index fwq_id_filetype_id_idx on fileworkqueue.job(id, filetype_id); > > > When we are processing without the workers running (ie just insert > into the log.file table, with the fileworkqueue.job table being filled > up by the trigger), we see a rate of about 3 milliseconds per insert. > When it is run with the workers removing data from the > fileworkqueue.job table, this drops to below 50 Ms. > > > Thanks for any ideas. > Does the worker keep a transaction open for a long period of time? ie. worker: startTransaction select jobs process job for a while delete from job where... commit If so, that might be a problem. -Andy
Jason Armstrong <ja@riverdrums.com> writes: > My worker processes then 'LISTEN' for the appropriate NOTIFY, select > the rows from the fileworkqueue.job table according to the > 'filetype_id', and transfer them. After processing, it deletes the row > from the workqueue. > When we are processing without the workers running (ie just insert > into the log.file table, with the fileworkqueue.job table being filled > up by the trigger), we see a rate of about 3 milliseconds per insert. > When it is run with the workers removing data from the > fileworkqueue.job table, this drops to below 50 Ms. Not sure if this is the source of your issue, but have you checked how many dead rows are in pg_listener? Applications that are making heavy use of NOTIFY/LISTEN tend to need *very* aggressive vacuuming of that table in order to keep performance up. (FWIW, 9.0 will have a rewritten notify mechanism that eliminates this problem. Doesn't help you today though.) regards, tom lane
Thank-you for the tips about this issue. I found two things: 1. NOTIFY/LISTEN was causing performance to degrade badly over time. I have reworked my code to poll the database instead. 2. There was a further function trigger that I didn't include in the description. I wanted to keep a status of the jobs in the workqueue. I created two triggers on the job table, that would update a 'status' table: > create table fileworkqueue.status(filetype_id smallint not null, num integer not null default 0 check (num >= 0)); > CREATE FUNCTION fileworkqueue.add_status() RETURNS TRIGGER AS $$ BEGIN UPDATE fileworkqueue.status SET num=num+1 WHERE filetype_id=NEW.filetype_id; return NULL; END; $$ LANGUAGE plpgsql; > CREATE FUNCTION fileworkqueue.del_status() RETURNS TRIGGER AS $$ BEGIN UPDATE fileworkqueue.status SET num=num-1 WHERE filetype_id=OLD.filetype_id; RETURN NULL; END; $$ LANGUAGE plpgsql; CREATE TRIGGER fileworkqueue_add_trigger AFTER INSERT ON fileworkqueue.job FOR EACH ROW EXECUTE PROCEDURE fileworkqueue.add_status(); CREATE TRIGGER fileworkqueue_del_trigger AFTER DELETE ON fileworkqueue.job FOR EACH ROW EXECUTE PROCEDURE fileworkqueue.del_status(); So there were actually two triggers in the original design: log.file -> fileworkqueue.job -> fileworkqueue.status When I removed the second trigger to the 'status' table, performance jumped tenfold. But now I had no means of monitoring how my workqueue was performing. I decided to do this in application code instead, via IPC. -- Jason Armstrong
Jason Armstrong <ja@riverdrums.com> writes: > 1. NOTIFY/LISTEN was causing performance to degrade badly over time. I > have reworked my code to poll the database instead. FWIW, you need to ensure pg_listener gets vacuumed pretty aggressively in order to prevent degradation in a high-traffic NOTIFY application. PG 9.0 will have a completely rewritten LISTEN/NOTIFY implementation that avoids use of a table and should scale a lot better, as well as not needing vacuuming support. That doesn't help you right now, but depending on what your development timescale is, you might want to plan to go back to LISTEN/NOTIFY later. regards, tom lane