Re: Performance Issues with count() - Mailing list pgsql-general

From Mark kirkwood
Subject Re: Performance Issues with count()
Date
Msg-id 1019810377.1267.89.camel@spikey.slithery.org
Whole thread Raw
In response to Performance Issues with count()  ("asdf asdasfa" <sjg@email.com>)
List pgsql-general
I wonder if using a summary table could help you here...

for example :

suppose you have a table - big_table(id,...,interesting_value)

and you want to provide counts of interesting_value quickly.

The plan is to create another table -
summ_table(interesting_value,...,its_count)

and keep it up to date with big_table via triggers.

Then any query of the form :

SELECT interesting_value, count(*)
FROM big_table
WHERE ...
GROUP BY interesting value;

can be answered by :

SELECT interesting_value,its_count
FROM summ_table
WHERE ...;

which is generally *much* faster.
(clearly a complete count is easy to answer quickly too...)

I must say that I have not used this technique in Postgresql (I have
used it in DB2 and Oracle). However this issue is typical for large
databases of all flavours (i.e its too slow to scan and count a lot of
values for each query.... so you do it once and save the results for
future reference).

In addition the rule system in Postgresql *might* be ameniable to
providing a sort of 'query rewrite' to automatically make (some) queries
on big_table go to summ_table instead...(have not tried this... but you
never know until you try)

best wishes

Mark

P.s : the trigger code to keep big_table and summ_table in sync is the
hard bit... but there are no doubt many folks on this list ...incl even
me.. who will happily help you out here.




pgsql-general by date:

Previous
From: Francois Suter
Date:
Subject: Re: pid gets overwritten in OSX
Next
From: "Joseph Artsimovich"
Date:
Subject: bit datatype in 7.2.1