Re: *very* slow query to summarize data for a month ... - Mailing list pgsql-performance

From Patrick Hatcher
Subject Re: *very* slow query to summarize data for a month ...
Date
Msg-id OF43AE737E.7548A814-ON88256DDA.00707536-88256DDA.00715E91@fds.com
Whole thread Raw
In response to *very* slow query to summarize data for a month ...  ("Marc G. Fournier" <scrappy@postgresql.org>)
Responses Re: *very* slow query to summarize data for a month ...
List pgsql-performance
Do you have an index on ts.bytes?  Josh had suggested this and after I put
it on my summed fields, I saw a speed increase.  I can't remember the
article was that Josh had written about index usage, but maybe he'll chime
in and supply the URL for his article.
hth

Patrick Hatcher




           "Marc G. Fournier"
           <scrappy@postgresql
           .org>                                                        To
           Sent by:                   pgsql-performance@postgresql.org
           pgsql-performance-o                                          cc
           wner@postgresql.org
                                                                   Subject
                                      [PERFORM] *very* slow query to
           11/10/2003 12:18 PM        summarize data for a month ...











Table structure is simple:

CREATE TABLE traffic_logs (
    company_id bigint,
    ip_id bigint,
    port integer,
    bytes bigint,
    runtime timestamp without time zone
);

runtime is 'day of month' ...

I need to summarize the month, per company, with a query as:

explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS
total_traffic
    FROM company c, traffic_logs ts
   WHERE c.company_id = ts.company_id
     AND month_trunc(ts.runtime) = '2003-10-01'
GROUP BY company_name,ts.company_id;

and the explain looks like:
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=32000.94..32083.07 rows=821 width=41) (actual
time=32983.36..47586.17 rows=144 loops=1)
   ->  Group  (cost=32000.94..32062.54 rows=8213 width=41) (actual
time=32957.40..42817.88 rows=462198 loops=1)
         ->  Sort  (cost=32000.94..32021.47 rows=8213 width=41) (actual
time=32957.38..36261.31 rows=462198 loops=1)
               Sort Key: c.company_name, ts.company_id
               ->  Merge Join  (cost=31321.45..31466.92 rows=8213 width=41)
(actual time=13983.07..22642.14 rows=462198 loops=1)
                     Merge Cond: ("outer".company_id = "inner".company_id)
                     ->  Sort  (cost=24.41..25.29 rows=352 width=25)
(actual time=5.52..7.40 rows=348 loops=1)
                           Sort Key: c.company_id
                           ->  Seq Scan on company c  (cost=0.00..9.52
rows=352 width=25) (actual time=0.02..2.78 rows=352 loops=1)
                     ->  Sort  (cost=31297.04..31317.57 rows=8213 width=16)
(actual time=13977.49..16794.41 rows=462198 loops=1)
                           Sort Key: ts.company_id
                           ->  Index Scan using tl_month on traffic_logs ts
(cost=0.00..30763.02 rows=8213 width=16) (actual time=0.29..5562.25
rows=462198 loops=1)
                                 Index Cond: (month_trunc(runtime)
= '2003-10-01 00:00:00'::timestamp without time zone)
 Total runtime: 47587.82 msec
(14 rows)

the problem is that we're only taking a few months worth of data, so I
don't think there is much of a way of 'improve performance' on this, but
figured I'd ask quickly before I do something rash ...

Note that without the month_trunc() index, the Total runtime more then
doubles:


QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=39578.63..39660.76 rows=821 width=41) (actual
time=87805.47..101251.35 rows=144 loops=1)
   ->  Group  (cost=39578.63..39640.23 rows=8213 width=41) (actual
time=87779.56..96824.56 rows=462198 loops=1)
         ->  Sort  (cost=39578.63..39599.17 rows=8213 width=41) (actual
time=87779.52..90781.48 rows=462198 loops=1)
               Sort Key: c.company_name, ts.company_id
               ->  Merge Join  (cost=38899.14..39044.62 rows=8213 width=41)
(actual time=64073.98..72783.68 rows=462198 loops=1)
                     Merge Cond: ("outer".company_id = "inner".company_id)
                     ->  Sort  (cost=24.41..25.29 rows=352 width=25)
(actual time=64.66..66.55 rows=348 loops=1)
                           Sort Key: c.company_id
                           ->  Seq Scan on company c  (cost=0.00..9.52
rows=352 width=25) (actual time=1.76..61.70 rows=352 loops=1)
                     ->  Sort  (cost=38874.73..38895.27 rows=8213 width=16)
(actual time=64009.26..66860.71 rows=462198 loops=1)
                           Sort Key: ts.company_id
                           ->  Seq Scan on traffic_logs ts
(cost=0.00..38340.72 rows=8213 width=16) (actual time=5.02..-645982.04
rows=462198 loops=1)
                                 Filter: (date_trunc('month'::text,
runtime) = '2003-10-01 00:00:00'::timestamp without time zone)
 Total runtime: 101277.17 msec
(14 rows)


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings




pgsql-performance by date:

Previous
From: "Marc G. Fournier"
Date:
Subject: *very* slow query to summarize data for a month ...
Next
From: "Patrick Hatcher"
Date:
Subject: Re: *very* slow query to summarize data for a month ...