Re: Best way to aggregate sum for each month - Mailing list pgsql-sql
From | Andreas Joseph Krogh |
---|---|
Subject | Re: Best way to aggregate sum for each month |
Date | |
Msg-id | VisenaEmail.20.799c4ff337ccdc94.14cc9c624d3@tc7-visena Whole thread Raw |
In response to | Re: Best way to aggregate sum for each month (Naresh Kumar <narain337@gmail.com>) |
Responses |
Re: Best way to aggregate sum for each month
|
List | pgsql-sql |
From what I can see your query performs worse, as CTEs are optimisation-fences. Your query needs to scans thru all activity_log entries, then do Merge Left Join .
Maybe there's no way to optimize the Merge Left Join node which takes ~300ms?
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
På lørdag 18. april 2015 kl. 01:29:37, skrev Naresh Kumar <narain337@gmail.com>:
Try this,;WITH cte as(SELECT date_trunc('month', log.start_date::timestamp without time zone) AS log_date, sum(log.duration) / (3600 * 1000)::NUMERIC as total_durationFROM activity_logGROUP BY date_trunc('month', log.start_date::timestamp without time zone))SELECT q.start_date, cte.total_durationFROM(SELECT cast(generate_series('2014-01-01' :: DATE, '2014-12-01' :: DATE, '1 month') AS DATE) as start_date) AS qLEFT OUTER JOIN cte ON cte.log_date = q.start_dateOn Fri, Apr 17, 2015 at 5:10 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:Hi all.I'm using PG-9.4.I'm having a table, activity_log, which holds activities with duration for a given date.I'm trying to sum the duration for each month in a given year and am currently doing it like this:select q.start_date , sum(log.duration) / (3600 * 1000)::NUMERIC as total_duration FROM (SELECT cast(generate_series('2014-01-01' :: DATE, '2014-12-01' :: DATE, '1 month') AS DATE) as start_date) AS q LEFT OUTER JOIN activity_log log ON date_trunc('month', log.start_date::timestamp without time zone) = q.start_date GROUP BY q.start_date ORDER BY q.start_date ;
I have the current index defined:create index activity_start_month ON activity_log(date_trunc('month', start_date::timestamp without time zone));
Here's the explain plan:QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=65.27..91102.17 rows=200 width=12) (actual time=179.983..344.517 rows=12 loops=1)
Group Key: ((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date)
-> Merge Left Join (cost=65.27..82950.79 rows=1629675 width=12) (actual time=163.894..310.011 rows=143551 loops=1)
Merge Cond: (((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date) = date_trunc('month'::text, (log.start_date)::timestamp without time zone))
-> Sort (cost=64.84..67.34 rows=1000 width=4) (actual time=0.045..0.052 rows=12 loops=1)
Sort Key: ((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date)
Sort Method: quicksort Memory: 25kB
-> Result (cost=0.00..5.01 rows=1000 width=0) (actual time=0.017..0.031 rows=12 loops=1)
-> Materialize (cost=0.42..51097.29 rows=325935 width=12) (actual time=0.032..227.692 rows=323180 loops=1)
-> Index Scan using activity_start_month on activity_log log (cost=0.42..50282.45 rows=325935 width=12) (actual time=0.029..185.746 rows=323180 loops=1)
Planning time: 0.201 ms
Execution time: 344.645 ms
(12 rows)Are there any ways to improve this?Thanks.--Andreas Joseph KroghCTO / Partner - Visena ASMobile: +47 909 56 963
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963