Re: *very* slow query to summarize data for a month ... - Mailing list pgsql-performance

From Marc G. Fournier
Subject Re: *very* slow query to summarize data for a month ...
Date
Msg-id 20031111134925.L56037@ganymede.hub.org
Whole thread Raw
In response to Re: *very* slow query to summarize data for a month ...  (Dennis Bjorklund <db@zigo.dhs.org>)
Responses Re: *very* slow query to summarize data for a month ...
List pgsql-performance

On Tue, 11 Nov 2003, Dennis Bjorklund wrote:

> On Mon, 10 Nov 2003, Marc G. Fournier wrote:
>
> >
> > explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic
> >     FROM company c, traffic_logs ts
> >    WHERE c.company_id = ts.company_id
> >      AND month_trunc(ts.runtime) = '2003-10-01'
> > GROUP BY company_name,ts.company_id;
>
> What if you do
>
>   ts.runtime >= '2003-10-01' AND ts.runtime < '2003-11-01'
>
> and add an index like (runtime, company_name, company_id)?

Good thought, but even simplifying it to the *lowest* query possible, with
no table joins, is painfully slow:

explain analyze SELECT ts.company_id, SUM(ts.bytes) AS total_traffic
    FROM traffic_logs ts
   WHERE month_trunc(ts.runtime) = '2003-10-01'
GROUP BY ts.company_id;


                                                                        QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=31630.84..31693.05 rows=829 width=16) (actual time=14862.71..26552.39 rows=144 loops=1)
   ->  Group  (cost=31630.84..31672.31 rows=8295 width=16) (actual time=9634.28..20967.07 rows=462198 loops=1)
         ->  Sort  (cost=31630.84..31651.57 rows=8295 width=16) (actual time=9634.24..12838.73 rows=462198 loops=1)
               Sort Key: company_id
               ->  Index Scan using tl_month on traffic_logs ts  (cost=0.00..31090.93 rows=8295 width=16) (actual
time=0.26..6043.35rows=462198 loops=1) 
                     Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without time zone)
 Total runtime: 26659.35 msec
(7 rows)



-OR-

explain analyze SELECT ts.company_id, SUM(ts.bytes) AS total_traffic
    FROM traffic_logs ts
   WHERE ts.runtime >= '2003-10-01' AND ts.runtime < '2003-11-01'
GROUP BY ts.company_id;


                                                                            QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=81044.53..84424.21 rows=45062 width=16) (actual time=13307.52..29274.66 rows=144 loops=1)
   ->  Group  (cost=81044.53..83297.65 rows=450625 width=16) (actual time=10809.02..-673265.13 rows=462198 loops=1)
         ->  Sort  (cost=81044.53..82171.09 rows=450625 width=16) (actual time=10808.99..14069.79 rows=462198 loops=1)
               Sort Key: company_id
               ->  Seq Scan on traffic_logs ts  (cost=0.00..38727.35 rows=450625 width=16) (actual time=0.07..6801.92
rows=462198loops=1) 
                     Filter: ((runtime >= '2003-10-01 00:00:00'::timestamp without time zone) AND (runtime <
'2003-11-0100:00:00'::timestamp without time zone)) 
 Total runtime: 29385.97 msec
(7 rows)


Just as a side note, just doing a straight scan for the records, with no
SUM()/GROUP BY involved, with the month_trunc() index is still >8k msec:

                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using tl_month on traffic_logs ts  (cost=0.00..31096.36 rows=8297 width=16) (actual time=0.96..5432.93
rows=462198loops=1) 
   Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without time zone)
 Total runtime: 8092.88 msec
(3 rows)

and without the index, >15k msec:

                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
 Seq Scan on traffic_logs ts  (cost=0.00..38719.55 rows=8297 width=16) (actual time=0.11..11354.45 rows=462198 loops=1)
   Filter: (date_trunc('month'::text, runtime) = '2003-10-01 00:00:00'::timestamp without time zone)
 Total runtime: 15353.57 msec
(3 rows)

so the GROUP BY is affecting the overall, but even without it, its still
taking a helluva long time ...

I'm going to modify my load script so that it dumps monthly totals to
traffic_logs, and 'details' to a schema.traffic_logs table ... I don't
need the 'per day totals' at the top level at all, only speed ... the 'per
day totals' are only required at the 'per client' level, and by moving the
'per day' into a client schema will shrink the table significantly ...

If it wasn't for trying to pull in that 'whole month' summary, it would be
fine :(

pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: *very* slow query to summarize data for a month ...
Next
From: Greg Stark
Date:
Subject: Re: *very* slow query to summarize data for a month ...