Home > mailing lists

Re: count(*) performance improvement ideas - Mailing list pgsql-hackers

From	Stephen Denne
Subject	Re: count(*) performance improvement ideas
Date	March 19, 2008 23:26:50
Msg-id	F0238EBA67824444BC1CB4700960CB4804EAC8F5@dmpeints002.isotach.com Whole thread Raw
In response to	Re: count(*) performance improvement ideas (Mark Mielke <mark@mark.mielke.cc>)
Responses	Re: count(*) performance improvement ideas
List	pgsql-hackers

Tree view

Mark Mielke wrote
> This returns to the question of whether count of the whole table is useful, or whether
> count of a GROUP BY or WHERE is useful.
> If GROUP BY or WHERE is useful, then trigger on UPDATE becomes necessary.

True... for the example I gave I should have had an update trigger on my table.
I neglected it as I based the example of a more complex multi-table example from my application, where I know that the
valueI'm grouping by doesn't change. 

> What is the direction here? Is it count of the whole table only? (<-- not interesting to me)
> Or count of more practical real life examples, which I completely agree with Greg,
> that this gets into the materialized view realm, and becomes very interesting.
> In my current db project, I never count all of the rows in a table. However, I do use count(*) with GROUP BY and
WHERE.
I'm trying to figure out how close I can come to a useful efficient materialized view with current production builds of
postgresql,
and identifying areas where changes to postgresql could make it easier.

Currently I can see three areas of concern:

1) Turning it on
My query to initially populate the materialized view table takes 6 hours on my (limited hardware) development system,
whichhighlights the problem of when do you turn on the triggers. 
An outage is one way to guarantee that there are neither missing details nor double counted details.
Would turning on the triggers and then running my initial population query in the same transaction work?

2) Table bloat
I'm trying to group together a large number of +1 or -1 deltas into a single delta per transaction.
This creates as many dead rows as there are updates.
This is also a problem with Pavan's suggestion of maintaining a counter table.
They can all be HOT updates in 8.3, but they still all create new row versions.

Tom says "Collapsing a transaction's changes
into a single row would require keeping some intra-transaction state,
which is do-able at the C-code level but not from plpgsql."

So can I hope that this problem is solvable as a contrib module that will work with at least 8.3?
I'd still want to write plpgsql trigger function myself, as I know the aggregation rules, and call contributed
functionsto integrate with the collapsing of the transaction's changes into a single row. (Expect I'd need a small
numberof rows per transaction, as I need to create delta rows for each value of my grouping field involved in the
transaction.

3) How to incorporate the deltas.
With my technique, if the transaction rolls back the delta record becomes a dead row, if it commits, the delta is then
visibleto transaction started after this time. 
I need to have a regular process run to sum and remove the deltas, rewriting the summary rows.
I'd like to be able to write an after-commit trigger that fires after changes in dependent tables that I identify,
whichcan add the transaction's deltas to the summary table. I would want it to effectively be a new, small transaction. 

I think that if these three areas are addressed, then before even considering writing code to automatically convert any
givenview into a materialized view. 

2) and 3) could perhaps be implemented with a per transaction map from my_custom_key to a prepared statement and a list
ofparameter values. 
Provide access to the values, making them updateable within the transaction. Have the statement automatically executed
oncommit. 

Pavan also refers to deferred triggers, which has got me thinking about another possible solution:

Instead of inserting a delta row, that will be updated a lot of times, create an on commit drop temp table named after
thetxid and the grouping value (which is an integer in my case, perhaps hash it if you're grouping by something that
doesn'teasily convert to part of a table name),  
create an after insert initially deferred constraint to call a function which will move the 'at commit' values of the
rowsin the temp table to the summary table. 
The temp table will only have one row inserted, updated many times, then on commit the trigger is fired once, and the
temptable is dropped. 

Does anyone think this will or won't work for some reason?

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any
attachmentsis confidential and may be subject to legal privilege.  If it is not intended for you please advise by reply
immediately,destroy it and do not copy, disclose or use it in any way. 
__________________________________________________________________ This email has been scanned by the DMZGlobal
BusinessQuality             Electronic Messaging Suite. 
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__________________________________________________________________

pgsql-hackers by date:

From: "Brendan Jurd"
Date: 19 March 2008, 23:04:02
Subject: Re: [PATCHES] Text <-> C string

From: "Omar Bettin"
Date: 19 March 2008, 23:38:36
Subject: diabolic state

Re: count(*) performance improvement ideas - Mailing list pgsql-hackers

Previous

Next