Thread: Best way to aggregate sum for each month
Hi all.
I'm using PG-9.4.
I'm having a table, activity_log, which holds activities with duration for a given date.
I'm trying to sum the duration for each month in a given year and am currently doing it like this:
select
q.start_date , sum(log.duration) / (3600 * 1000)::NUMERIC as total_duration
FROM
(SELECT cast(generate_series('2014-01-01' :: DATE, '2014-12-01' :: DATE, '1 month') AS DATE) as start_date) AS q LEFT OUTER JOIN activity_log log ON date_trunc('month', log.start_date::timestamp without time zone) = q.start_date
GROUP BY q.start_date
ORDER BY q.start_date
;
I have the current index defined:
create index activity_start_month ON activity_log(date_trunc('month', start_date::timestamp without time zone));
Here's the explain plan:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=65.27..91102.17 rows=200 width=12) (actual time=179.983..344.517 rows=12 loops=1)
Group Key: ((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date)
-> Merge Left Join (cost=65.27..82950.79 rows=1629675 width=12) (actual time=163.894..310.011 rows=143551 loops=1)
Merge Cond: (((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date) = date_trunc('month'::text, (log.start_date)::timestamp without time zone))
-> Sort (cost=64.84..67.34 rows=1000 width=4) (actual time=0.045..0.052 rows=12 loops=1)
Sort Key: ((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date)
Sort Method: quicksort Memory: 25kB
-> Result (cost=0.00..5.01 rows=1000 width=0) (actual time=0.017..0.031 rows=12 loops=1)
-> Materialize (cost=0.42..51097.29 rows=325935 width=12) (actual time=0.032..227.692 rows=323180 loops=1)
-> Index Scan using activity_start_month on activity_log log (cost=0.42..50282.45 rows=325935 width=12) (actual time=0.029..185.746 rows=323180 loops=1)
Planning time: 0.201 ms
Execution time: 344.645 ms
(12 rows)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=65.27..91102.17 rows=200 width=12) (actual time=179.983..344.517 rows=12 loops=1)
Group Key: ((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date)
-> Merge Left Join (cost=65.27..82950.79 rows=1629675 width=12) (actual time=163.894..310.011 rows=143551 loops=1)
Merge Cond: (((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date) = date_trunc('month'::text, (log.start_date)::timestamp without time zone))
-> Sort (cost=64.84..67.34 rows=1000 width=4) (actual time=0.045..0.052 rows=12 loops=1)
Sort Key: ((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date)
Sort Method: quicksort Memory: 25kB
-> Result (cost=0.00..5.01 rows=1000 width=0) (actual time=0.017..0.031 rows=12 loops=1)
-> Materialize (cost=0.42..51097.29 rows=325935 width=12) (actual time=0.032..227.692 rows=323180 loops=1)
-> Index Scan using activity_start_month on activity_log log (cost=0.42..50282.45 rows=325935 width=12) (actual time=0.029..185.746 rows=323180 loops=1)
Planning time: 0.201 ms
Execution time: 344.645 ms
(12 rows)
Are there any ways to improve this?
Thanks.
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
Try this,
;WITH cte as
(
SELECT date_trunc('month', log.start_date::timestamp without time zone) AS log_date, sum(log.duration) / (3600 * 1000)::NUMERIC as total_duration
FROM activity_log
GROUP BY date_trunc('month', log.start_date::timestamp without time zone)
)
SELECT q.start_date, cte.total_duration
FROM
(SELECT cast(generate_series('2014-01-01' :: DATE, '2014-12-01' :: DATE, '1 month') AS DATE) as start_date) AS q
LEFT OUTER JOIN cte ON cte.log_date = q.start_date
On Fri, Apr 17, 2015 at 5:10 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:
Hi all.I'm using PG-9.4.I'm having a table, activity_log, which holds activities with duration for a given date.I'm trying to sum the duration for each month in a given year and am currently doing it like this:select q.start_date , sum(log.duration) / (3600 * 1000)::NUMERIC as total_duration FROM (SELECT cast(generate_series('2014-01-01' :: DATE, '2014-12-01' :: DATE, '1 month') AS DATE) as start_date) AS q LEFT OUTER JOIN activity_log log ON date_trunc('month', log.start_date::timestamp without time zone) = q.start_date GROUP BY q.start_date ORDER BY q.start_date ;
I have the current index defined:create index activity_start_month ON activity_log(date_trunc('month', start_date::timestamp without time zone));
Here's the explain plan:QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=65.27..91102.17 rows=200 width=12) (actual time=179.983..344.517 rows=12 loops=1)
Group Key: ((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date)
-> Merge Left Join (cost=65.27..82950.79 rows=1629675 width=12) (actual time=163.894..310.011 rows=143551 loops=1)
Merge Cond: (((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date) = date_trunc('month'::text, (log.start_date)::timestamp without time zone))
-> Sort (cost=64.84..67.34 rows=1000 width=4) (actual time=0.045..0.052 rows=12 loops=1)
Sort Key: ((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date)
Sort Method: quicksort Memory: 25kB
-> Result (cost=0.00..5.01 rows=1000 width=0) (actual time=0.017..0.031 rows=12 loops=1)
-> Materialize (cost=0.42..51097.29 rows=325935 width=12) (actual time=0.032..227.692 rows=323180 loops=1)
-> Index Scan using activity_start_month on activity_log log (cost=0.42..50282.45 rows=325935 width=12) (actual time=0.029..185.746 rows=323180 loops=1)
Planning time: 0.201 ms
Execution time: 344.645 ms
(12 rows)Are there any ways to improve this?Thanks.--Andreas Joseph KroghCTO / Partner - Visena ASMobile: +47 909 56 963
Attachment
From what I can see your query performs worse, as CTEs are optimisation-fences. Your query needs to scans thru all activity_log entries, then do Merge Left Join .
Maybe there's no way to optimize the Merge Left Join node which takes ~300ms?
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
På lørdag 18. april 2015 kl. 01:29:37, skrev Naresh Kumar <narain337@gmail.com>:
Try this,;WITH cte as(SELECT date_trunc('month', log.start_date::timestamp without time zone) AS log_date, sum(log.duration) / (3600 * 1000)::NUMERIC as total_durationFROM activity_logGROUP BY date_trunc('month', log.start_date::timestamp without time zone))SELECT q.start_date, cte.total_durationFROM(SELECT cast(generate_series('2014-01-01' :: DATE, '2014-12-01' :: DATE, '1 month') AS DATE) as start_date) AS qLEFT OUTER JOIN cte ON cte.log_date = q.start_dateOn Fri, Apr 17, 2015 at 5:10 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:Hi all.I'm using PG-9.4.I'm having a table, activity_log, which holds activities with duration for a given date.I'm trying to sum the duration for each month in a given year and am currently doing it like this:select q.start_date , sum(log.duration) / (3600 * 1000)::NUMERIC as total_duration FROM (SELECT cast(generate_series('2014-01-01' :: DATE, '2014-12-01' :: DATE, '1 month') AS DATE) as start_date) AS q LEFT OUTER JOIN activity_log log ON date_trunc('month', log.start_date::timestamp without time zone) = q.start_date GROUP BY q.start_date ORDER BY q.start_date ;
I have the current index defined:create index activity_start_month ON activity_log(date_trunc('month', start_date::timestamp without time zone));
Here's the explain plan:QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=65.27..91102.17 rows=200 width=12) (actual time=179.983..344.517 rows=12 loops=1)
Group Key: ((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date)
-> Merge Left Join (cost=65.27..82950.79 rows=1629675 width=12) (actual time=163.894..310.011 rows=143551 loops=1)
Merge Cond: (((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date) = date_trunc('month'::text, (log.start_date)::timestamp without time zone))
-> Sort (cost=64.84..67.34 rows=1000 width=4) (actual time=0.045..0.052 rows=12 loops=1)
Sort Key: ((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date)
Sort Method: quicksort Memory: 25kB
-> Result (cost=0.00..5.01 rows=1000 width=0) (actual time=0.017..0.031 rows=12 loops=1)
-> Materialize (cost=0.42..51097.29 rows=325935 width=12) (actual time=0.032..227.692 rows=323180 loops=1)
-> Index Scan using activity_start_month on activity_log log (cost=0.42..50282.45 rows=325935 width=12) (actual time=0.029..185.746 rows=323180 loops=1)
Planning time: 0.201 ms
Execution time: 344.645 ms
(12 rows)Are there any ways to improve this?Thanks.--Andreas Joseph KroghCTO / Partner - Visena ASMobile: +47 909 56 963
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
On Friday, April 17, 2015, Andreas Joseph Krogh <andreas@visena.com> wrote:
Hi all.I'm using PG-9.4.I'm having a table, activity_log, which holds activities with duration for a given date.I'm trying to sum the duration for each month in a given year and am currently doing it like this:select q.start_date , sum(log.duration) / (3600 * 1000)::NUMERIC as total_duration FROM (SELECT cast(generate_series('2014-01-01' :: DATE, '2014-12-01' :: DATE, '1 month') AS DATE) as start_date) AS q LEFT OUTER JOIN activity_log log ON date_trunc('month', log.start_date::timestamp without time zone) = q.start_date GROUP BY q.start_date ORDER BY q.start_date ;
I have the current index defined:create index activity_start_month ON activity_log(date_trunc('month', start_date::timestamp without time zone));
Here's the explain plan:QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=65.27..91102.17 rows=200 width=12) (actual time=179.983..344.517 rows=12 loops=1)
Group Key: ((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date)
-> Merge Left Join (cost=65.27..82950.79 rows=1629675 width=12) (actual time=163.894..310.011 rows=143551 loops=1)
Merge Cond: (((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date) = date_trunc('month'::text, (log.start_date)::timestamp without time zone))
-> Sort (cost=64.84..67.34 rows=1000 width=4) (actual time=0.045..0.052 rows=12 loops=1)
Sort Key: ((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date)
Sort Method: quicksort Memory: 25kB
-> Result (cost=0.00..5.01 rows=1000 width=0) (actual time=0.017..0.031 rows=12 loops=1)
-> Materialize (cost=0.42..51097.29 rows=325935 width=12) (actual time=0.032..227.692 rows=323180 loops=1)
-> Index Scan using activity_start_month on activity_log log (cost=0.42..50282.45 rows=325935 width=12) (actual time=0.029..185.746 rows=323180 loops=1)
Planning time: 0.201 ms
Execution time: 344.645 ms
(12 rows)Are there any ways to improve this?
I would usually turn activity_log log into (select dt, sum(...) from activity_log group by dt) log
And then left join with coalesce to build the final output.
Work_mem adjustment may help too.
David J.
I wrote it in the form of cte for readability purpose.
If you are aggregating by month date and sure that there will be atleast one record for every month you don't require to generate the series and JOIN.
On Fri, Apr 17, 2015 at 6:46 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:
From what I can see your query performs worse, as CTEs are optimisation-fences. Your query needs to scans thru all activity_log entries, then do Merge Left Join .Maybe there's no way to optimize the Merge Left Join node which takes ~300ms?På lørdag 18. april 2015 kl. 01:29:37, skrev Naresh Kumar <narain337@gmail.com>:Try this,;WITH cte as(SELECT date_trunc('month', log.start_date::timestamp without time zone) AS log_date, sum(log.duration) / (3600 * 1000)::NUMERIC as total_durationFROM activity_logGROUP BY date_trunc('month', log.start_date::timestamp without time zone))SELECT q.start_date, cte.total_durationFROM(SELECT cast(generate_series('2014-01-01' :: DATE, '2014-12-01' :: DATE, '1 month') AS DATE) as start_date) AS qLEFT OUTER JOIN cte ON cte.log_date = q.start_dateOn Fri, Apr 17, 2015 at 5:10 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:Hi all.I'm using PG-9.4.I'm having a table, activity_log, which holds activities with duration for a given date.I'm trying to sum the duration for each month in a given year and am currently doing it like this:select q.start_date , sum(log.duration) / (3600 * 1000)::NUMERIC as total_duration FROM (SELECT cast(generate_series('2014-01-01' :: DATE, '2014-12-01' :: DATE, '1 month') AS DATE) as start_date) AS q LEFT OUTER JOIN activity_log log ON date_trunc('month', log.start_date::timestamp without time zone) = q.start_date GROUP BY q.start_date ORDER BY q.start_date ;
I have the current index defined:create index activity_start_month ON activity_log(date_trunc('month', start_date::timestamp without time zone));
Here's the explain plan:QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=65.27..91102.17 rows=200 width=12) (actual time=179.983..344.517 rows=12 loops=1)
Group Key: ((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date)
-> Merge Left Join (cost=65.27..82950.79 rows=1629675 width=12) (actual time=163.894..310.011 rows=143551 loops=1)
Merge Cond: (((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date) = date_trunc('month'::text, (log.start_date)::timestamp without time zone))
-> Sort (cost=64.84..67.34 rows=1000 width=4) (actual time=0.045..0.052 rows=12 loops=1)
Sort Key: ((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date)
Sort Method: quicksort Memory: 25kB
-> Result (cost=0.00..5.01 rows=1000 width=0) (actual time=0.017..0.031 rows=12 loops=1)
-> Materialize (cost=0.42..51097.29 rows=325935 width=12) (actual time=0.032..227.692 rows=323180 loops=1)
-> Index Scan using activity_start_month on activity_log log (cost=0.42..50282.45 rows=325935 width=12) (actual time=0.029..185.746 rows=323180 loops=1)
Planning time: 0.201 ms
Execution time: 344.645 ms
(12 rows)Are there any ways to improve this?Thanks.--Andreas Joseph KroghCTO / Partner - Visena ASMobile: +47 909 56 963