Re: performance of count(*) - Mailing list pgsql-general

From Tomas Vondra
Subject Re: performance of count(*)
Date
Msg-id 4DC44E8B.5040207@fuzzy.cz
Whole thread Raw
In response to performance of count(*)  (Scott Ribe <scott_ribe@elevated-dev.com>)
Responses Re: performance of count(*)  (Scott Ribe <scott_ribe@elevated-dev.com>)
List pgsql-general
Dne 6.5.2011 20:45, Scott Ribe napsal(a):
> I need to optimize queries that deal with some aggregates regarding
> resource availability. My specific problem is, I think, very closely
> analogous to select count(*)... where...
>
> I know roughly how to do it, aggregated stats table, triggers
> appending to it, occasional updates to coalesce entries. I'd just like
> to see an example to confirm my own plan and see if I'm missing any
> details.
>
> I'm sure I've seen references to articles on ways to do this, but all
> google is getting me is generic complaints about count(*) performance
> and suggestions to use stats for estimated total rows in a table,
> nothing useful for this.
>

Well I guess you got most of that right - just create a table to hold
aggregated values, and then a bunch of triggers to update it. The
structute and implementation of the triggers really depend on your
needs, but in general there are two approaches - eager and lazy.

Eager - the triggers immediately update the aggregates (increment a
count, add a value to the sum etc.). This is a 'primitive' and less
complex solution, suitable when the update is cheap and the aggregated
table is often read.

Lazy - just mark the row in the aggregated table as 'dirty' and
recompute it only if it's read. This is suitable if the table is read
only occasionaly and/or when computing the aggregate is complex (e.g.
needs to reread the whole dataset - as for example computing variance).

The lazy approach is not that usual, a great example how to implement
that is available here:

http://www.pgcon.org/2008/schedule/events/69.en.html

Anyway I'd recommend to start with the eager approach, it's much easier
to implement. You can implement the lazy approach later, if you find out
it's needed.

And you should strive to use HOT feature (if you're on >= 8.4),
especially with the eager approach - it often does a lot of updates and
leads to bloat of the aggregated table. So decrease the fillfactor and
do not index the columns that are updated by the triggers.

regards
Tomas

pgsql-general by date:

Previous
From: Scott Ribe
Date:
Subject: performance of count(*)
Next
From: Andrew Sullivan
Date:
Subject: Re: performance of count(*)