SLOW query with aggregates - Mailing list pgsql-performance

From A Palmblad
Subject SLOW query with aggregates
Date
Msg-id 003201c41111$f4dc96f0$97019696@AERS04
Whole thread Raw
Responses Re: SLOW query with aggregates
Re: SLOW query with aggregates
List pgsql-performance
I am trying to optimize a query that does a lot of aggregation.  I have a
large number of columns that are part of the result, and most are
aggregates.  They are acting on two temporary tables, the largest of which
should have at most 1 million tuples, and the smaller around 5000; the the
smaller table matches the number of rows expecting in the result.  I've
played around with some indexes on the temp tables, and analyzing them; even
using a vacuum analyze, and the worst part is always a groupAggregate.

    This query can be optimized at the expense of other operations; it will
be run during low usage hours.  I have tried to bump up sort_mem to get the
query optimizer to cosider a HashAggregate instread of a groupAggregate;
setting it as high as 2 gigs still had the query optimizer using
GroupAggregate.

The troublesome query is:

select
    tempItems.category_id,
    date('2003-11-22'),
    sum(a) as a,
    count(*) as b,
    sum(case when type = 1 then 0 else someNumber end) as successful,
    sum(c) as c,
    ........
    ........
    tempAggregates.mode as mode
    -variations of the above repeated around 30 times, with a few other
aggregates like min and max making an appearance, and some array stuff
   from tempItems join tempAggregates using (category_id)
   group by tempItems.category_id, mode

I've tried just grouping by category_id, and doing a max(mode), but that
doesn't have much of an effect on performance; although row estimation for
the group aggregate was better.  A lot is being done, so maybe I can't get
it to be much more efficient...

Here's the output of an explain analyze:


QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
---
 GroupAggregate  (cost=0.00..338300.34 rows=884 width=345) (actual
time=86943.272..382718.104 rows=3117 loops=1)
   ->  Merge Join  (cost=0.00..93642.52 rows=1135610 width=345) (actual
time=0.148..24006.748 rows=1120974 loops=1)
         Merge Cond: ("outer".category_id = "inner".category_id)
         ->  Index Scan using tempaggregatesindex on tempaggregates
(cost=0.00..91.31 rows=3119 width=115) (actual time=0.055..6.573 rows=3117
loops=1)
         ->  Index Scan using tempitemsindex on tempitems
(cost=0.00..79348.45 rows=1135610 width=241) (actual time=0.064..7511.980
rows=1121164 loops=1)
 Total runtime: 382725.502 ms
(6 rows)

Any thoughts or suggestions would be appreciated.

-Adam Palmblad


pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: [ADMIN] Benchmarking postgres on Solaris/Linux
Next
From: Tom Lane
Date:
Subject: Re: SLOW query with aggregates