Home > mailing lists

Re: How to optimize select count(*)..group by? - Mailing list pgsql-general

From	Scott Marlowe
Subject	Re: How to optimize select count(*)..group by?
Date	July 28, 2005 17:23:50
Msg-id	1122571418.32465.17.camel@state.g2switchworks.com Whole thread Raw
In response to	How to optimize select count(*)..group by? (Bryan Field-Elliot <bryan_lists@netmeme.org>)
List	pgsql-general

Tree view

On Thu, 2005-07-28 at 11:19, Bryan Field-Elliot wrote:
> We have this simple query:
>
> select status, count(*) from customer group by status;
>
> There is already a btree index on status, but, the customer table is
> huge, and this query must be executed very frequently... an "explain"
> on this query shows that it is quite costly (and we notice it runs
> slowly)...
>
> Can someone recommend the best technique to optimize this? We can
> create new indices, we can re-write this query.. But we'd rather not
> add new tables or columns if possible (not just to solve this
> problem).

With a query of the form:

select field,count([field|*]) from table WHERE somefield =
somecondition;

the query planner is going to have to scan every single row returned by
that where clause.  There's no shortcut, because the visibility rules of
MVCC means you have to look at every tuple IN THE TABLE, not in the
index (it's the way postgresql is built, and it isn't likely to change
soon, because putting the visibility information in indexes is
expensive, and would result in VERY slow updates and very large
indexes).

So, the best optimization is to use a selective where clause.

If you run the query with a where clause of something like:

where processdate between '01 july 2005' and '07 july 2005'

then you should get better performance.

pgsql-general by date:

From: Greg Stark
Date: 28 July 2005, 17:22:29
Subject: Re: How to optimize select count(*)..group by?

From: wayne schlemitz
Date: 28 July 2005, 17:36:19
Subject: UNSUBSCRIBE

Re: How to optimize select count(*)..group by? - Mailing list pgsql-general

Previous

Next