Thread: Schema for caching message-count in folders using triggers
Hi all.
I'm facing a problem with my current schema for email where folders start containing several 100K of messages and count(*) in them taks noticeable time. This schema is accessible from IMAP and a web-app so lots of queries of the type "list folders with message count" are performed.
So, I'm toying with this idea of caching the message-count in the folder-table itself.
I currently have this:
CREATE or replace FUNCTION count_increment_tf() RETURNS TRIGGER AS $_$
BEGIN
UPDATE folder SET message_count = message_count + 1 WHERE id = NEW.folder_id;
RETURN NEW;
END $_$ LANGUAGE 'plpgsql';
CREATE or replace FUNCTION count_decrement_tf() RETURNS TRIGGER AS $_$
BEGIN
UPDATE folder SET message_count = message_count - 1 WHERE id = OLD.folder_id;
RETURN OLD;
END $_$ LANGUAGE 'plpgsql';
CREATE or replace FUNCTION count_update_tf() RETURNS TRIGGER AS $_$
BEGIN
UPDATE folder SET message_count = message_count - 1 WHERE id = OLD.folder_id;
UPDATE folder SET message_count = message_count + 1 WHERE id = NEW.folder_id;
RETURN NEW;
END $_$ LANGUAGE 'plpgsql';
CREATE TRIGGER increment_folder_msg_t AFTER INSERT ON message FOR EACH ROW EXECUTE PROCEDURE count_increment_tf();
CREATE TRIGGER decrement_folder_msg_t AFTER DELETE ON message FOR EACH ROW EXECUTE PROCEDURE count_decrement_tf();
CREATE TRIGGER update_folder_msg_t AFTER UPDATE ON message FOR EACH ROW EXECUTE PROCEDURE count_update_tf();
The problem with this is locking (waiting for another TX to commit when updating the same folder) and deadlock issues when trying to simultaneously insert/delete/update messages in a folder.
Does anyone have any better ideas for safely caching the message-count in each folder without locking and deadlock issues?
Thanks.
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
On 03/05/2015 11:45 AM, Andreas Joseph Krogh wrote: > Hi all. > I'm facing a problem with my current schema for email where folders > start containing several 100K of messages and count(*) in them taks > noticeable time. This schema is accessible from IMAP and a web-app so > lots of queries of the type "list folders with message count" are performed. > So, I'm toying with this idea of caching the message-count in the > folder-table itself. > I currently have this: > > CREATE or replace FUNCTIONcount_increment_tf()RETURNS TRIGGER AS$_$ > BEGIN > UPDATE folder SETmessage_count =message_count + 1 WHEREid =NEW.folder_id; > RETURNNEW; > END $_$LANGUAGE'plpgsql'; > > CREATE or replace FUNCTIONcount_decrement_tf()RETURNS TRIGGER AS$_$ > BEGIN > UPDATE folder SETmessage_count =message_count - 1 WHEREid =OLD.folder_id; > RETURNOLD; > END $_$LANGUAGE'plpgsql'; > > CREATE or replace FUNCTIONcount_update_tf()RETURNS TRIGGER AS$_$ > BEGIN > UPDATE folder SETmessage_count =message_count - 1 WHEREid =OLD.folder_id; > UPDATE folder SETmessage_count =message_count + 1 WHEREid =NEW.folder_id; > RETURNNEW; > END $_$LANGUAGE'plpgsql'; > > CREATE TRIGGERincrement_folder_msg_tAFTER INSERT ON message FOR EACH ROW EXECUTE PROCEDUREcount_increment_tf(); > CREATE TRIGGERdecrement_folder_msg_tAFTER DELETE ON message FOR EACH ROW EXECUTE PROCEDUREcount_decrement_tf(); > CREATE TRIGGERupdate_folder_msg_tAFTER UPDATE ON message FOR EACH ROW EXECUTE PROCEDUREcount_update_tf(); > > The problem with this is locking (waiting for another TX to commit when > updating the same folder) and deadlock issues when trying to > simultaneously insert/delete/update messages in a folder. > Does anyone have any better ideas for safely caching the message-count > in each folder without locking and deadlock issues? How accurate does this have to be? Not exactly following what is folder? Is it a table that contains the messages? A top of the head idea would be to use sequences. Create a sequence for each folder starting at current count and then use nextval, setval to change the value: http://www.postgresql.org/docs/9.4/interactive/functions-sequence.html It is not transactional, so it would probably not be spot on, which is why I asked about accuracy earlier. > Thanks. > -- > *Andreas Joseph Krogh* > CTO / Partner - Visena AS > Mobile: +47 909 56 963 > andreas@visena.com <mailto:andreas@visena.com> > www.visena.com <https://www.visena.com> > <https://www.visena.com> -- Adrian Klaver adrian.klaver@aklaver.com
På torsdag 05. mars 2015 kl. 20:59:28, skrev Adrian Klaver <adrian.klaver@aklaver.com>:
On 03/05/2015 11:45 AM, Andreas Joseph Krogh wrote:
> Hi all.
> I'm facing a problem with my current schema for email where folders
> start containing several 100K of messages and count(*) in them taks
> noticeable time. This schema is accessible from IMAP and a web-app so
> lots of queries of the type "list folders with message count" are performed.
> So, I'm toying with this idea of caching the message-count in the
> folder-table itself.
> I currently have this:
>
> CREATE or replace FUNCTIONcount_increment_tf()RETURNS TRIGGER AS$_$
> BEGIN
> UPDATE folder SETmessage_count =message_count + 1 WHEREid =NEW.folder_id;
> RETURNNEW;
> END $_$LANGUAGE'plpgsql';
>
> CREATE or replace FUNCTIONcount_decrement_tf()RETURNS TRIGGER AS$_$
> BEGIN
> UPDATE folder SETmessage_count =message_count - 1 WHEREid =OLD.folder_id;
> RETURNOLD;
> END $_$LANGUAGE'plpgsql';
>
> CREATE or replace FUNCTIONcount_update_tf()RETURNS TRIGGER AS$_$
> BEGIN
> UPDATE folder SETmessage_count =message_count - 1 WHEREid =OLD.folder_id;
> UPDATE folder SETmessage_count =message_count + 1 WHEREid =NEW.folder_id;
> RETURNNEW;
> END $_$LANGUAGE'plpgsql';
>
> CREATE TRIGGERincrement_folder_msg_tAFTER INSERT ON message FOR EACH ROW EXECUTE PROCEDUREcount_increment_tf();
> CREATE TRIGGERdecrement_folder_msg_tAFTER DELETE ON message FOR EACH ROW EXECUTE PROCEDUREcount_decrement_tf();
> CREATE TRIGGERupdate_folder_msg_tAFTER UPDATE ON message FOR EACH ROW EXECUTE PROCEDUREcount_update_tf();
>
> The problem with this is locking (waiting for another TX to commit when
> updating the same folder) and deadlock issues when trying to
> simultaneously insert/delete/update messages in a folder.
> Does anyone have any better ideas for safely caching the message-count
> in each folder without locking and deadlock issues?
How accurate does this have to be?
Not exactly following what is folder?
Is it a table that contains the messages?
A top of the head idea would be to use sequences. Create a sequence for
each folder starting at current count and then use nextval, setval to
change the value:
http://www.postgresql.org/docs/9.4/interactive/functions-sequence.html
It is not transactional, so it would probably not be spot on, which is
why I asked about accuracy earlier.
Yes, 'folder' is a table which contains 'message':
create table folder( id serial PRIMARY KEY, name varchar not null unique, message_count integer not null default 0
);
create table message( id serial PRIMARY KEY, folder_id INTEGER NOT NULL REFERENCES folder(id), message varchar not null
);
The count has to be exact, no estimate from EXPLAIN or such...
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
On 03/05/2015 12:04 PM, Andreas Joseph Krogh wrote: > På torsdag 05. mars 2015 kl. 20:59:28, skrev Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>>: > > > > The problem with this is locking (waiting for another TX to > commit when > > updating the same folder) and deadlock issues when trying to > > simultaneously insert/delete/update messages in a folder. > > Does anyone have any better ideas for safely caching the > message-count > > in each folder without locking and deadlock issues? > > How accurate does this have to be? > > Not exactly following what is folder? > Is it a table that contains the messages? > > A top of the head idea would be to use sequences. Create a sequence for > each folder starting at current count and then use nextval, setval to > change the value: > > http://www.postgresql.org/docs/9.4/interactive/functions-sequence.html > > It is not transactional, so it would probably not be spot on, which is > why I asked about accuracy earlier. > > Yes, 'folder' is a table which contains 'message': > > create tablefolder( > idserial PRIMARY KEY, > namevarchar not null unique, > message_countinteger not null default0 > ); > > create table message( > idserial PRIMARY KEY, > folder_idINTEGER NOT NULL REFERENCESfolder(id), > messagevarchar not null > ); > > The count has to be exact, no estimate from EXPLAIN or such... Well there goes my idea. Seems the way to go is partitioning: http://www.postgresql.org/docs/9.4/static/ddl-partitioning.html Break the data into smaller units > -- > *Andreas Joseph Krogh* > CTO / Partner - Visena AS > Mobile: +47 909 56 963 > andreas@visena.com <mailto:andreas@visena.com> > www.visena.com <https://www.visena.com> > <https://www.visena.com> -- Adrian Klaver adrian.klaver@aklaver.com
Andreas Joseph Krogh wrote: > Hi all. I'm facing a problem with my current schema for email where folders > start containing several 100K of messages and count(*) in them taks noticeable > time. This schema is accessible from IMAP and a web-app so lots of queries of > the type "list folders with message count" are performed. So, I'm toying with > this idea of caching the message-count in the folder-table itself. You can do this better by keeping a table with per-folder counts and deltas. There is one main row which keeps the total value at some point in time. Each time you insert a message, add a "delta" entry with value 1; each time you remove, add a delta with value -1. You can do this with a trigger on insert/update/delete. This way, there is no contention because there are no updates. To figure out the total value, just add all the values (the main plus all deltas for that folder). From time to time you have a process that summarizes all these entries into one total value again. Something like WITH deleted AS (DELETE FROM counts WHERE type= 'delta' RETURNING value), total AS (SELECT coalesce(sum(value), 0) as sum FROM deleted) UPDATE counts SET value = counts.value + total.sum FROM total WHERE type = 'total' RETURNING counts.value -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
På torsdag 05. mars 2015 kl. 22:16:01, skrev Alvaro Herrera <alvherre@2ndquadrant.com>:
Andreas Joseph Krogh wrote:
> Hi all. I'm facing a problem with my current schema for email where folders
> start containing several 100K of messages and count(*) in them taks noticeable
> time. This schema is accessible from IMAP and a web-app so lots of queries of
> the type "list folders with message count" are performed. So, I'm toying with
> this idea of caching the message-count in the folder-table itself.
You can do this better by keeping a table with per-folder counts and
deltas. There is one main row which keeps the total value at some point
in time. Each time you insert a message, add a "delta" entry with value
1; each time you remove, add a delta with value -1. You can do this
with a trigger on insert/update/delete. This way, there is no
contention because there are no updates.
To figure out the total value, just add all the values (the main plus
all deltas for that folder).
>From time to time you have a process that summarizes all these entries
into one total value again. Something like
WITH deleted AS (DELETE
FROM counts
WHERE type = 'delta' RETURNING value),
total AS (SELECT coalesce(sum(value), 0) as sum
FROM deleted)
UPDATE counts
SET value = counts.value + total.sum
FROM total WHERE type = 'total'
RETURNING counts.value
Like it, thanks!
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963