Re: Performance problem with row count trigger - Mailing list pgsql-sql

From Wei Weng
Subject Re: Performance problem with row count trigger
Date
Msg-id 49D51D53.2030305@kencast.com
Whole thread Raw
In response to Re: Performance problem with row count trigger  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-sql
On 04/02/2009 03:32 PM, Tom Lane wrote: <blockquote cite="mid:16805.1238700723@sss.pgh.pa.us" type="cite"><pre
wrap="">TonyCebzanov <a class="moz-txt-link-rfc2396E"
href="mailto:tonyceb@andrew.cmu.edu"><tonyceb@andrew.cmu.edu></a>writes: </pre><blockquote type="cite"><pre
wrap="">WhatI want to do is update the assoc_count field in the dataset table
 
to reflect the count of related records in the assoc field.  To do so, I
added the following trigger:   </pre></blockquote><pre wrap=""> </pre><blockquote type="cite"><pre wrap="">CREATE OR
REPLACEFUNCTION update_assoc_count_insert()
 
RETURNS TRIGGER AS
'
BEGIN   UPDATE dataset       SET assoc_count = assoc_count + 1       WHERE dataset_id = NEW.dataset_id;   RETURN NEW;
END
' LANGUAGE plpgsql;   </pre></blockquote><pre wrap=""> </pre><blockquote type="cite"><pre wrap="">CREATE TRIGGER
assoc_update_assoc_count_insertAFTER INSERT ON assoc   FOR EACH ROW EXECUTE PROCEDURE update_assoc_count_insert();
</pre></blockquote><prewrap="">
 
There is basically no way that this is going to not suck :-(.  In the
first place, using an AFTER trigger means that each update queues an
AFTER trigger update event that has to be fired at statement or
transaction end.  In the second place (as Craig correctly noted) this
results in a separate update to the count-table row for each inserted
row, which tremendously bloats the count table with dead tuples.
In the third place, if you have any concurrency of insertions, it
disappears because all the inserters need to update the same count row.

If you dig in the pgsql-hackers archives, you will find that the
original scheme for this was to have each transaction accumulate its
total number of insertions minus deletions for a table in local memory,
and then insert *one* delta row into the count table just before
transaction commit.  I don't think it's possible to do that with just
user-level triggers (not least because we haven't got ON COMMIT
triggers); it would have to be a C-code addition.  The various blog
entries you cite are non-peer-reviewed oversimplifications of that
design.

Digging around, the oldest description I can find of this idea is
<a class="moz-txt-link-freetext"
href="http://archives.postgresql.org/pgsql-hackers/2003-09/msg00387.php">http://archives.postgresql.org/pgsql-hackers/2003-09/msg00387.php</a>
although there are more recent threads rehashing the topic.

One point I don't recall anyone mentioning is that the stats subsystem
now implements a fairly large subset of this work already, namely the
initial data accumulation.  So you could imagine plugging something into
that to send the deltas to a table in addition to the stats collector.
        regards, tom lane
 </pre></blockquote><br /> So, basically other than reading from pg_class table about the tuple count, there isn't a
goodway to optimize the COUNT(*)?<br /><br /><br /> Thanks<br /> Wei<br /><br /> 

pgsql-sql by date:

Previous
From: Tom Lane
Date:
Subject: Re: Performance problem with row count trigger
Next
From: "Greg Sabino Mullane"
Date:
Subject: Re: Performance problem with row count trigger