Thread: Schema for caching message-count in folders using triggers

Schema for caching message-count in folders using triggers

From
Andreas Joseph Krogh
Date:
Hi all.
 
I'm facing a problem with my current schema for email where folders start containing several 100K of messages and count(*) in them taks noticeable time. This schema is accessible from IMAP and a web-app so lots of queries of the type "list folders with message count" are performed.
 
So, I'm toying with this idea of caching the message-count in the folder-table itself.
 
I currently have this:
 
CREATE or replace FUNCTION count_increment_tf() RETURNS TRIGGER AS $_$
BEGIN
UPDATE folder SET message_count = message_count + 1 WHERE id = NEW.folder_id;
RETURN NEW;
END $_$ LANGUAGE 'plpgsql';

CREATE or replace FUNCTION count_decrement_tf() RETURNS TRIGGER AS $_$
BEGIN
    UPDATE folder SET message_count = message_count - 1 WHERE id = OLD.folder_id;
RETURN OLD;
END $_$ LANGUAGE 'plpgsql';

CREATE or replace FUNCTION count_update_tf() RETURNS TRIGGER AS $_$
BEGIN
    UPDATE folder SET message_count = message_count - 1 WHERE id = OLD.folder_id;
    UPDATE folder SET message_count = message_count + 1 WHERE id = NEW.folder_id;
RETURN NEW;
END $_$ LANGUAGE 'plpgsql';

CREATE TRIGGER increment_folder_msg_t AFTER INSERT ON message FOR EACH ROW EXECUTE PROCEDURE count_increment_tf();
CREATE TRIGGER decrement_folder_msg_t AFTER DELETE ON message FOR EACH ROW EXECUTE PROCEDURE count_decrement_tf();
CREATE TRIGGER update_folder_msg_t AFTER UPDATE ON message FOR EACH ROW EXECUTE PROCEDURE count_update_tf();
 
The problem with this is locking (waiting for another TX to commit when updating the same folder) and deadlock issues when trying to simultaneously insert/delete/update messages  in a folder.
 
Does anyone have any better ideas for safely caching the message-count in each folder without locking and deadlock issues?
 
Thanks.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment

Re: Schema for caching message-count in folders using triggers

From
Adrian Klaver
Date:
On 03/05/2015 11:45 AM, Andreas Joseph Krogh wrote:
> Hi all.
> I'm facing a problem with my current schema for email where folders
> start containing several 100K of messages and count(*) in them taks
> noticeable time. This schema is accessible from IMAP and a web-app so
> lots of queries of the type "list folders with message count" are performed.
> So, I'm toying with this idea of caching the message-count in the
> folder-table itself.
> I currently have this:
>
> CREATE or replace FUNCTIONcount_increment_tf()RETURNS TRIGGER AS$_$
> BEGIN
> UPDATE folder SETmessage_count  =message_count  + 1 WHEREid  =NEW.folder_id;
> RETURNNEW;
> END $_$LANGUAGE'plpgsql';
>
> CREATE or replace FUNCTIONcount_decrement_tf()RETURNS TRIGGER AS$_$
> BEGIN
>      UPDATE folder SETmessage_count  =message_count  - 1 WHEREid  =OLD.folder_id;
> RETURNOLD;
> END $_$LANGUAGE'plpgsql';
>
> CREATE or replace FUNCTIONcount_update_tf()RETURNS TRIGGER AS$_$
> BEGIN
>      UPDATE folder SETmessage_count  =message_count  - 1 WHEREid  =OLD.folder_id;
>      UPDATE folder SETmessage_count  =message_count  + 1 WHEREid  =NEW.folder_id;
> RETURNNEW;
> END $_$LANGUAGE'plpgsql';
>
> CREATE TRIGGERincrement_folder_msg_tAFTER INSERT ON message FOR EACH ROW EXECUTE PROCEDUREcount_increment_tf();
> CREATE TRIGGERdecrement_folder_msg_tAFTER DELETE ON message FOR EACH ROW EXECUTE PROCEDUREcount_decrement_tf();
> CREATE TRIGGERupdate_folder_msg_tAFTER UPDATE ON message FOR EACH ROW EXECUTE PROCEDUREcount_update_tf();
>
> The problem with this is locking (waiting for another TX to commit when
> updating the same folder) and deadlock issues when trying to
> simultaneously insert/delete/update messages  in a folder.
> Does anyone have any better ideas for safely caching the message-count
> in each folder without locking and deadlock issues?

How accurate does this have to be?

Not exactly following what is folder?
Is it a table that contains the messages?

A top of the head idea would be to use sequences. Create a sequence for 
each folder starting at current count and then use nextval, setval to 
change the value:

http://www.postgresql.org/docs/9.4/interactive/functions-sequence.html

It is not transactional, so it would probably not be spot on, which is 
why I asked about accuracy earlier.

> Thanks.
> --
> *Andreas Joseph Krogh*
> CTO / Partner - Visena AS
> Mobile: +47 909 56 963
> andreas@visena.com <mailto:andreas@visena.com>
> www.visena.com <https://www.visena.com>
> <https://www.visena.com>


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Schema for caching message-count in folders using triggers

From
Andreas Joseph Krogh
Date:
På torsdag 05. mars 2015 kl. 20:59:28, skrev Adrian Klaver <adrian.klaver@aklaver.com>:
On 03/05/2015 11:45 AM, Andreas Joseph Krogh wrote:
> Hi all.
> I'm facing a problem with my current schema for email where folders
> start containing several 100K of messages and count(*) in them taks
> noticeable time. This schema is accessible from IMAP and a web-app so
> lots of queries of the type "list folders with message count" are performed.
> So, I'm toying with this idea of caching the message-count in the
> folder-table itself.
> I currently have this:
>
> CREATE or replace FUNCTIONcount_increment_tf()RETURNS TRIGGER AS$_$
> BEGIN
> UPDATE folder SETmessage_count  =message_count  + 1 WHEREid  =NEW.folder_id;
> RETURNNEW;
> END $_$LANGUAGE'plpgsql';
>
> CREATE or replace FUNCTIONcount_decrement_tf()RETURNS TRIGGER AS$_$
> BEGIN
>      UPDATE folder SETmessage_count  =message_count  - 1 WHEREid  =OLD.folder_id;
> RETURNOLD;
> END $_$LANGUAGE'plpgsql';
>
> CREATE or replace FUNCTIONcount_update_tf()RETURNS TRIGGER AS$_$
> BEGIN
>      UPDATE folder SETmessage_count  =message_count  - 1 WHEREid  =OLD.folder_id;
>      UPDATE folder SETmessage_count  =message_count  + 1 WHEREid  =NEW.folder_id;
> RETURNNEW;
> END $_$LANGUAGE'plpgsql';
>
> CREATE TRIGGERincrement_folder_msg_tAFTER INSERT ON message FOR EACH ROW EXECUTE PROCEDUREcount_increment_tf();
> CREATE TRIGGERdecrement_folder_msg_tAFTER DELETE ON message FOR EACH ROW EXECUTE PROCEDUREcount_decrement_tf();
> CREATE TRIGGERupdate_folder_msg_tAFTER UPDATE ON message FOR EACH ROW EXECUTE PROCEDUREcount_update_tf();
>
> The problem with this is locking (waiting for another TX to commit when
> updating the same folder) and deadlock issues when trying to
> simultaneously insert/delete/update messages  in a folder.
> Does anyone have any better ideas for safely caching the message-count
> in each folder without locking and deadlock issues?

How accurate does this have to be?

Not exactly following what is folder?
Is it a table that contains the messages?

A top of the head idea would be to use sequences. Create a sequence for
each folder starting at current count and then use nextval, setval to
change the value:

http://www.postgresql.org/docs/9.4/interactive/functions-sequence.html

It is not transactional, so it would probably not be spot on, which is
why I asked about accuracy earlier.
 
Yes, 'folder' is a table which contains 'message':
 
create table folder(    id serial PRIMARY KEY,    name varchar not null unique,    message_count integer not null default 0
);

create table message(    id serial PRIMARY KEY,    folder_id INTEGER NOT NULL REFERENCES folder(id),    message varchar not null
);
 
The count has to be exact, no estimate from EXPLAIN or such...
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: Schema for caching message-count in folders using triggers

From
Adrian Klaver
Date:
On 03/05/2015 12:04 PM, Andreas Joseph Krogh wrote:
> På torsdag 05. mars 2015 kl. 20:59:28, skrev Adrian Klaver
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>>:

>      >
>      > The problem with this is locking (waiting for another TX to
>     commit when
>      > updating the same folder) and deadlock issues when trying to
>      > simultaneously insert/delete/update messages  in a folder.
>      > Does anyone have any better ideas for safely caching the
>     message-count
>      > in each folder without locking and deadlock issues?
>
>     How accurate does this have to be?
>
>     Not exactly following what is folder?
>     Is it a table that contains the messages?
>
>     A top of the head idea would be to use sequences. Create a sequence for
>     each folder starting at current count and then use nextval, setval to
>     change the value:
>
>     http://www.postgresql.org/docs/9.4/interactive/functions-sequence.html
>
>     It is not transactional, so it would probably not be spot on, which is
>     why I asked about accuracy earlier.
>
> Yes, 'folder' is a table which contains 'message':
>
> create tablefolder(
>      idserial PRIMARY KEY,
>      namevarchar not null unique,
>      message_countinteger not null default0
> );
>
> create table message(
>      idserial PRIMARY KEY,
>      folder_idINTEGER NOT NULL REFERENCESfolder(id),
>      messagevarchar not null
> );
>
> The count has to be exact, no estimate from EXPLAIN or such...

Well there goes my idea. Seems the way to go is partitioning:

http://www.postgresql.org/docs/9.4/static/ddl-partitioning.html

Break the data into smaller units

> --
> *Andreas Joseph Krogh*
> CTO / Partner - Visena AS
> Mobile: +47 909 56 963
> andreas@visena.com <mailto:andreas@visena.com>
> www.visena.com <https://www.visena.com>
> <https://www.visena.com>


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Schema for caching message-count in folders using triggers

From
Alvaro Herrera
Date:
Andreas Joseph Krogh wrote:
> Hi all.   I'm facing a problem with my current schema for email where folders 
> start containing several 100K of messages and count(*) in them taks noticeable 
> time. This schema is accessible from IMAP and a web-app so lots of queries of 
> the type "list folders with message count" are performed.   So, I'm toying with 
> this idea of caching the message-count in the folder-table itself.

You can do this better by keeping a table with per-folder counts and
deltas.  There is one main row which keeps the total value at some point
in time.  Each time you insert a message, add a "delta" entry with value
1; each time you remove, add a delta with value -1.  You can do this
with a trigger on insert/update/delete.  This way, there is no
contention because there are no updates.

To figure out the total value, just add all the values (the main plus
all deltas for that folder).

From time to time you have a process that summarizes all these entries
into one total value again.  Something like
             WITH deleted AS (DELETE                                 FROM counts                               WHERE
type= 'delta' RETURNING value),                     total AS (SELECT coalesce(sum(value), 0) as sum
           FROM deleted)                  UPDATE counts                    SET value = counts.value + total.sum
          FROM total WHERE type = 'total'               RETURNING counts.value
 

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Schema for caching message-count in folders using triggers

From
Andreas Joseph Krogh
Date:
På torsdag 05. mars 2015 kl. 22:16:01, skrev Alvaro Herrera <alvherre@2ndquadrant.com>:
Andreas Joseph Krogh wrote:
> Hi all.   I'm facing a problem with my current schema for email where folders
> start containing several 100K of messages and count(*) in them taks noticeable
> time. This schema is accessible from IMAP and a web-app so lots of queries of
> the type "list folders with message count" are performed.   So, I'm toying with
> this idea of caching the message-count in the folder-table itself.

You can do this better by keeping a table with per-folder counts and
deltas.  There is one main row which keeps the total value at some point
in time.  Each time you insert a message, add a "delta" entry with value
1; each time you remove, add a delta with value -1.  You can do this
with a trigger on insert/update/delete.  This way, there is no
contention because there are no updates.

To figure out the total value, just add all the values (the main plus
all deltas for that folder).

>From time to time you have a process that summarizes all these entries
into one total value again.  Something like

              WITH deleted AS (DELETE
                                 FROM counts
                                WHERE type = 'delta' RETURNING value),
                     total AS (SELECT coalesce(sum(value), 0) as sum
                                 FROM deleted)
                  UPDATE counts
                     SET value = counts.value + total.sum
                    FROM total WHERE type = 'total'
               RETURNING counts.value
 
Like it, thanks!
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment