Re: count(*) performance improvement ideas - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: count(*) performance improvement ideas
Date
Msg-id 2e78013d0803200000l34e306a3j515447017b58a58d@mail.gmail.com
Whole thread Raw
In response to Re: count(*) performance improvement ideas  ("Stephen Denne" <Stephen.Denne@datamail.co.nz>)
Responses Re: count(*) performance improvement ideas
List pgsql-hackers
On Thu, Mar 20, 2008 at 3:24 AM, Stephen Denne
<Stephen.Denne@datamail.co.nz> wrote:

>
>  Pavan also refers to deferred triggers, which has got me thinking about another possible solution:
>
>  Instead of inserting a delta row, that will be updated a lot of times, create an on commit drop temp table named
afterthe txid and the grouping value (which is an integer in my case, perhaps hash it if you're grouping by something
thatdoesn't easily convert to part of a table name),
 
>  create an after insert initially deferred constraint to call a function which will move the 'at commit' values of
therows in the temp table to the summary table.
 
>  The temp table will only have one row inserted, updated many times, then on commit the trigger is fired once, and
thetemp table is dropped.
 
>
>  Does anyone think this will or won't work for some reason?


I think this should work, although you may need to take some extra steps
to manage the summary table. Also, I think a single temp table per transaction
should suffice. The temp table would have one row per "group by" or "where"
condition on which you want to track the count. The corresponding row will
be updated as and when the corresponding count changes. You would need
INSERT/DELETE/UPDATE triggers to do that. If there are any subtransaction
aborts, that will be taken care by MVCC.

As you said, a single deferred trigger would then merge the temp table with
the summary table. Here we need to be extra careful because serializable
transactions may fail to update the same row in the summary table. One solution
I can think of is (and I haven't checked the archives, so somebody might have
already suggested this before):

Each summary table will have one summary row per "group by" or "where" condition
(same as temp table). In addition to that, it can have zero or more temporary
rows for the conditions.
  - select summary_rows from summary table for update nowait;  - if lock is not available, insert our deltas into the
summarytable as a new    row. These rows will be deleted as soon as some other transaction gets lock    on the summary
rowsand merge our deltas with them  - if lock is available     - merge our deltas with the summary rows     - check for
othertemporary deltas and merge them with the summary rows        and delete those temporary rows
 

I guess we can write the generic triggers as contrib module. What needs to done
is to let user specify the tables and the conditions on which they want to track
count(*) and then apply those conditions in the generic triggers.


Thanks,
Pavan
-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Proposal: new large object API
Next
From: Martin Pihlak
Date:
Subject: Re: stored procedure stats in collector