Re: Best way to aggregate sum for each month - Mailing list pgsql-sql

From David G. Johnston
Subject Re: Best way to aggregate sum for each month
Date
Msg-id CAKFQuwaHyLjqTxbrju=vMFxDnSb=5oypORZUkJeE8=frMBshqQ@mail.gmail.com
Whole thread Raw
In response to Best way to aggregate sum for each month  (Andreas Joseph Krogh <andreas@visena.com>)
List pgsql-sql
On Friday, April 17, 2015, Andreas Joseph Krogh <andreas@visena.com> wrote:
Hi all.
 
I'm using PG-9.4.
 
I'm having a table, activity_log, which holds activities with duration for a given date.
I'm trying to sum the duration for each month in a given year and am currently doing it like this:
 
select
    q.start_date    , sum(log.duration) / (3600 * 1000)::NUMERIC as total_duration
FROM
    (SELECT cast(generate_series('2014-01-01' :: DATE, '2014-12-01' :: DATE, '1 month') AS DATE) as start_date) AS q    LEFT OUTER JOIN activity_log log ON date_trunc('month', log.start_date::timestamp without time zone) = q.start_date
GROUP BY q.start_date
ORDER BY q.start_date
;

I have the current index defined:
 
create index activity_start_month ON activity_log(date_trunc('month', start_date::timestamp without time zone));

 
Here's the explain plan:
 
                                                                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=65.27..91102.17 rows=200 width=12) (actual time=179.983..344.517 rows=12 loops=1)
   Group Key: ((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date)
   ->  Merge Left Join  (cost=65.27..82950.79 rows=1629675 width=12) (actual time=163.894..310.011 rows=143551 loops=1)
         Merge Cond: (((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date) = date_trunc('month'::text, (log.start_date)::timestamp without time zone))
         ->  Sort  (cost=64.84..67.34 rows=1000 width=4) (actual time=0.045..0.052 rows=12 loops=1)
               Sort Key: ((generate_series(('2014-01-01'::date)::timestamp with time zone, ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date)
               Sort Method: quicksort  Memory: 25kB
               ->  Result  (cost=0.00..5.01 rows=1000 width=0) (actual time=0.017..0.031 rows=12 loops=1)
         ->  Materialize  (cost=0.42..51097.29 rows=325935 width=12) (actual time=0.032..227.692 rows=323180 loops=1)
               ->  Index Scan using activity_start_month on activity_log log  (cost=0.42..50282.45 rows=325935 width=12) (actual time=0.029..185.746 rows=323180 loops=1)
 Planning time: 0.201 ms
 Execution time: 344.645 ms
(12 rows)
 
 
Are there any ways to improve this?


I would usually turn activity_log log into (select dt, sum(...) from activity_log group by dt) log

And then left join with coalesce to build the final output.

Work_mem adjustment may help too.

David J.

pgsql-sql by date:

Previous
From: Andreas Joseph Krogh
Date:
Subject: Re: Best way to aggregate sum for each month
Next
From: Naresh Kumar
Date:
Subject: Re: Best way to aggregate sum for each month