Re: Performance problem with row count trigger - Mailing list pgsql-sql
From | Wei Weng |
---|---|
Subject | Re: Performance problem with row count trigger |
Date | |
Msg-id | 49D51D53.2030305@kencast.com Whole thread Raw |
In response to | Re: Performance problem with row count trigger (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-sql |
On 04/02/2009 03:32 PM, Tom Lane wrote: <blockquote cite="mid:16805.1238700723@sss.pgh.pa.us" type="cite"><pre wrap="">TonyCebzanov <a class="moz-txt-link-rfc2396E" href="mailto:tonyceb@andrew.cmu.edu"><tonyceb@andrew.cmu.edu></a>writes: </pre><blockquote type="cite"><pre wrap="">WhatI want to do is update the assoc_count field in the dataset table to reflect the count of related records in the assoc field. To do so, I added the following trigger: </pre></blockquote><pre wrap=""> </pre><blockquote type="cite"><pre wrap="">CREATE OR REPLACEFUNCTION update_assoc_count_insert() RETURNS TRIGGER AS ' BEGIN UPDATE dataset SET assoc_count = assoc_count + 1 WHERE dataset_id = NEW.dataset_id; RETURN NEW; END ' LANGUAGE plpgsql; </pre></blockquote><pre wrap=""> </pre><blockquote type="cite"><pre wrap="">CREATE TRIGGER assoc_update_assoc_count_insertAFTER INSERT ON assoc FOR EACH ROW EXECUTE PROCEDURE update_assoc_count_insert(); </pre></blockquote><prewrap=""> There is basically no way that this is going to not suck :-(. In the first place, using an AFTER trigger means that each update queues an AFTER trigger update event that has to be fired at statement or transaction end. In the second place (as Craig correctly noted) this results in a separate update to the count-table row for each inserted row, which tremendously bloats the count table with dead tuples. In the third place, if you have any concurrency of insertions, it disappears because all the inserters need to update the same count row. If you dig in the pgsql-hackers archives, you will find that the original scheme for this was to have each transaction accumulate its total number of insertions minus deletions for a table in local memory, and then insert *one* delta row into the count table just before transaction commit. I don't think it's possible to do that with just user-level triggers (not least because we haven't got ON COMMIT triggers); it would have to be a C-code addition. The various blog entries you cite are non-peer-reviewed oversimplifications of that design. Digging around, the oldest description I can find of this idea is <a class="moz-txt-link-freetext" href="http://archives.postgresql.org/pgsql-hackers/2003-09/msg00387.php">http://archives.postgresql.org/pgsql-hackers/2003-09/msg00387.php</a> although there are more recent threads rehashing the topic. One point I don't recall anyone mentioning is that the stats subsystem now implements a fairly large subset of this work already, namely the initial data accumulation. So you could imagine plugging something into that to send the deltas to a table in addition to the stats collector. regards, tom lane </pre></blockquote><br /> So, basically other than reading from pg_class table about the tuple count, there isn't a goodway to optimize the COUNT(*)?<br /><br /><br /> Thanks<br /> Wei<br /><br />