Re: Best way to aggregate sum for each month - Mailing list pgsql-sql

From Naresh Kumar
Subject Re: Best way to aggregate sum for each month
Date
Msg-id CAHuWDLjijcprUDJ1bGAdxPBWLUMdEu71evQa2GQP9tRef2SHfA@mail.gmail.com
Whole thread Raw
In response to Re: Best way to aggregate sum for each month  (Andreas Joseph Krogh <andreas@visena.com>)
List pgsql-sql
I wrote it in the form of cte for readability purpose. 

If you are aggregating by month date and sure that there will be atleast one record for every month you don't require to generate the series and JOIN. 


On Fri, Apr 17, 2015 at 6:46 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:
From what I can see your query performs worse, as CTEs are optimisation-fences. Your query needs to scans thru all activity_log entries, then do Merge Left Join .
Maybe there's no way to optimize the Merge Left Join node which takes ~300ms?
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
 
 
På lørdag 18. april 2015 kl. 01:29:37, skrev Naresh Kumar <narain337@gmail.com>:
Try this,
 
 
;WITH cte as
(
SELECT date_trunc('month', log.start_date::timestamp without time zone) AS log_date, sum(log.duration) / (3600 * 1000)::NUMERIC as total_duration
FROM activity_log
GROUP BY date_trunc('month', log.start_date::timestamp without time zone)
)
 
SELECT q.start_date, cte.total_duration
FROM
    (SELECT cast(generate_series('2014-01-01' :: DATE, '2014-12-01' :: DATE, '1 month') AS DATE) as start_date) AS q
    LEFT OUTER JOIN cte ON cte.log_date = q.start_date
 
On Fri, Apr 17, 2015 at 5:10 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:
Hi all.
 
I'm using PG-9.4.
 
I'm having a table, activity_log, which holds activities with duration for a given date.
I'm trying to sum the duration for each month in a given year and am currently doing it like this:
 
select
    q.start_date    , sum(log.duration) / (3600 * 1000)::NUMERIC as total_duration
FROM
    (SELECT cast(generate_series('2014-01-01' :: DATE, '2014-12-01' :: DATE, '1 month') AS DATE) as start_date) AS q    LEFT OUTER JOIN activity_log log ON date_trunc('month', log.start_date::timestamp without time zone) = q.start_date
GROUP BY q.start_date
ORDER BY q.start_date
;

I have the current index defined:
 
create index activity_start_month ON activity_log(date_trunc('month', start_date::timestamp without time zone));

 
Here's the explain plan:
 
                                                                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=65.27..91102.17 rows=200 width=12) (actual time=179.983..344.517 rows=12 loops=1)
   Group Key: ((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date)
   ->  Merge Left Join  (cost=65.27..82950.79 rows=1629675 width=12) (actual time=163.894..310.011 rows=143551 loops=1)
         Merge Cond: (((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date) = date_trunc('month'::text, (log.start_date)::timestamp without time zone))
         ->  Sort  (cost=64.84..67.34 rows=1000 width=4) (actual time=0.045..0.052 rows=12 loops=1)
               Sort Key: ((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date)
               Sort Method: quicksort  Memory: 25kB
               ->  Result  (cost=0.00..5.01 rows=1000 width=0) (actual time=0.017..0.031 rows=12 loops=1)
         ->  Materialize  (cost=0.42..51097.29 rows=325935 width=12) (actual time=0.032..227.692 rows=323180 loops=1)
               ->  Index Scan using activity_start_month on activity_log log  (cost=0.42..50282.45 rows=325935 width=12) (actual time=0.029..185.746 rows=323180 loops=1)
 Planning time: 0.201 ms
 Execution time: 344.645 ms
(12 rows)
 
 
Are there any ways to improve this?
 
Thanks.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
 
 
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
 

Attachment

pgsql-sql by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Best way to aggregate sum for each month
Next
From: Suresh Raja
Date:
Subject: function to send email with query results