Thread: Cumulative count (running total) window fn

Cumulative count (running total) window fn

From

Oliver Kohll - Mailing Lists

Date:

29 April 2010, 05:53:06

Hello,

Many thanks to andreas.kretschmer for this helpful reply about how to set up a window function to perform a running
total:
http://archives.postgresql.org/pgsql-general/2010-03/msg01122.php

It works perfectly with the simple test data but I've just got back to work, tried implementing it on my live data and
theresults are slightly different. My query is almost exactly the same - I've simplified by grouping by year only
ratherthan year and month: 

select extract(year from signup_date),
  count(email_address),
  sum(count(email_address)) over (rows unbounded preceding)
from email_list group by 1 order by 1;

 date_part | count | sum
-----------+-------+------
      2007 |   501 | 1374
      2008 |   491 |  491
      2009 |   382 |  873
      2010 |    66 | 1440
(4 rows)

What I'm looking for is
 date_part | count | sum
-----------+-------+------
      2007 |   501 | 501
      2008 |   491 |  992
      2009 |   382 |  1374
      2010 |    66 | 1440

It seems to be adding up the counts but not in the right order.

I've also tried an explicit ORDER BY inside the partition with no difference:

select extract(year from signup_date),
  count(email_address),
  sum(count(email_address)) over (partition by 1 order by 1 asc rows unbounded preceding)
from email_list group by 1 order by 1;

Does anyone have any other ideas?

Regards
Oliver Kohll

oliver@agilebase.co.uk / +44(0)7814 828608 / skype:okohll
www.agilebase.co.uk - software
www.gtwm.co.uk - company

Re: Cumulative count (running total) window fn

From

Magnus Hagander

Date:

29 April 2010, 06:01:29

On Thu, Apr 29, 2010 at 10:52, Oliver Kohll - Mailing Lists
<oliver.lists@gtwm.co.uk> wrote:
> Hello,
>
> Many thanks to andreas.kretschmer for this helpful reply about how to set up a window function to perform a running
total:
> http://archives.postgresql.org/pgsql-general/2010-03/msg01122.php
>
> It works perfectly with the simple test data but I've just got back to work, tried implementing it on my live data
andthe results are slightly different. My query is almost exactly the same - I've simplified by grouping by year only
ratherthan year and month: 
>
> select extract(year from signup_date),
>  count(email_address),
>  sum(count(email_address)) over (rows unbounded preceding)
> from email_list group by 1 order by 1;
>
>  date_part | count | sum
> -----------+-------+------
>      2007 |   501 | 1374
>      2008 |   491 |  491
>      2009 |   382 |  873
>      2010 |    66 | 1440
> (4 rows)
>
> What I'm looking for is
>  date_part | count | sum
> -----------+-------+------
>      2007 |   501 | 501
>      2008 |   491 |  992
>      2009 |   382 |  1374
>      2010 |    66 | 1440
>
> It seems to be adding up the counts but not in the right order.
>
> I've also tried an explicit ORDER BY inside the partition with no difference:
>
> select extract(year from signup_date),
>  count(email_address),
>  sum(count(email_address)) over (partition by 1 order by 1 asc rows unbounded preceding)
> from email_list group by 1 order by 1;
>
> Does anyone have any other ideas?

Aren't you looking for something along the line of:

SELECT year, sum(c) over (order by year)
FROM (
  SELECT extract(year from signup_date) AS year, count(email_address) AS c
  FROM email_list
  GROUP BY extract(year from signup_date)
)

(adjust for typos, I didn't test it)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Cumulative count (running total) window fn

From

Oliver Kohll - Mailing Lists

Date:

29 April 2010, 07:39:58

On 29 Apr 2010, at 10:01, Magnus Hagander wrote:

select extract(year from signup_date),
count(email_address),
sum(count(email_address)) over (partition by 1 order by 1 asc rows unbounded preceding)
from email_list group by 1 order by 1;

Does anyone have any other ideas?

Aren't you looking for something along the line of:

SELECT year, sum(c) over (order by year)
FROM (
SELECT extract(year from signup_date) AS year, count(email_address) AS c
FROM email_list
GROUP BY extract(year from signup_date)
)

(adjust for typos, I didn't test it)

Yes that does work thanks, if you give the subquery a name. I'd still like to know if it's possible to do with a window function rather than a subquery.

Oliver Kohll

Re: Cumulative count (running total) window fn

From

Thom Brown

Date:

29 April 2010, 08:03:04

On 29 April 2010 11:39, Oliver Kohll - Mailing Lists <oliver.lists@gtwm.co.uk> wrote:

On 29 Apr 2010, at 10:01, Magnus Hagander wrote:

select extract(year from signup_date),
count(email_address),
sum(count(email_address)) over (partition by 1 order by 1 asc rows unbounded preceding)
from email_list group by 1 order by 1;

Does anyone have any other ideas?

Aren't you looking for something along the line of:

SELECT year, sum(c) over (order by year)
FROM (
SELECT extract(year from signup_date) AS year, count(email_address) AS c
FROM email_list
GROUP BY extract(year from signup_date)
)

(adjust for typos, I didn't test it)

Yes that does work thanks, if you give the subquery a name. I'd still like to know if it's possible to do with a window function rather than a subquery.

Oliver Kohll

Like this?:

SELECT extract(year from signup_date), count(email_address), sum(count(email_address)) OVER (ORDER BY count(email_address)) FROM email_list GROUP BY 1 ORDER BY 1;

Thom

Re: Cumulative count (running total) window fn

From

Oliver Kohll - Mailing Lists

Date:

29 April 2010, 08:43:43

Aren't you looking for something along the line of:

SELECT year, sum(c) over (order by year)
FROM (
SELECT extract(year from signup_date) AS year, count(email_address) AS c
FROM email_list
GROUP BY extract(year from signup_date)
)

(adjust for typos, I didn't test it)

Yes that does work thanks, if you give the subquery a name. I'd still like to know if it's possible to do with a window function rather than a subquery.

Oliver Kohll

Like this?:

SELECT extract(year from signup_date), count(email_address), sum(count(email_address)) OVER (ORDER BY count(email_address)) FROM email_list GROUP BY 1 ORDER BY 1;

Thom

Almost, but put me on the right track! This one is exactly what I'm looking for:

SELECT extract(year from signup_date), count(email_address), sum(count(email_address)) OVER (ORDER BY extract(year from signup_date)) FROM email_list GROUP BY 1 ORDER BY 1;

The ORDER BY count(email_address) did give the same results for my data but only because the count values just happen to give the same ordering as the years - I tested by changing some dates.

Many thanks all.

Oliver

Re: Cumulative count (running total) window fn

From

Magnus Hagander

Date:

29 April 2010, 08:45:06

On Thu, Apr 29, 2010 at 13:43, Oliver Kohll - Mailing Lists
<oliver.lists@gtwm.co.uk> wrote:
>>
>> Aren't you looking for something along the line of:
>>
>> SELECT year, sum(c) over (order by year)
>> FROM (
>>  SELECT extract(year from signup_date) AS year, count(email_address) AS c
>>  FROM email_list
>>  GROUP BY extract(year from signup_date)
>> )
>>
>> (adjust for typos, I didn't test it)
>>
>> Yes that does work thanks, if you give the subquery a name. I'd still like
>> to know if it's possible to do with a window function rather than a
>> subquery.
>> Oliver Kohll
>
> Like this?:
>
> SELECT extract(year from signup_date), count(email_address),
> sum(count(email_address)) OVER (ORDER BY count(email_address)) FROM
> email_list GROUP BY 1 ORDER BY 1;
>
> Thom
>
> Almost, but put me on the right track! This one is exactly what I'm looking
> for:
> SELECT extract(year from signup_date), count(email_address),
> sum(count(email_address)) OVER (ORDER BY extract(year from signup_date))
> FROM email_list GROUP BY 1 ORDER BY 1;
> The ORDER BY count(email_address) did give the same results for my data but
> only because the count values just happen to give the same ordering as the
> years - I tested by changing some dates.
> Many thanks all.

Curious note - how does the non-subselect version and the subselect
version compare performance-wise?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Cumulative count (running total) window fn

From

Oliver Kohll - Mailing Lists

Date:

29 April 2010, 14:21:53

Curious note - how does the non-subselect version and the subselect
version compare performance-wise?

Magnus,

On a test table with 12,000 rows there's not much in it, the subselect has a simpler plan but they both take practically the same time.

The two plans (note I've been rewriting the field names for readability until now but haven't here):

explain analyze SELECT year, sum(c) over (order by year)

FROM (

SELECT extract(year from a56b7a8d6de03f67b) AS year, count(a10e4ab8863c199f1) AS c

FROM a2e9a7e9e257153de

GROUP BY extract(year from a56b7a8d6de03f67b)

) as subq;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------

WindowAgg (cost=851.49..874.06 rows=1290 width=16) (actual time=43.369..43.394 rows=5 loops=1)

-> Sort (cost=851.49..854.71 rows=1290 width=16) (actual time=43.340..43.342 rows=5 loops=1)

Sort Key: (date_part('year'::text, a2e9a7e9e257153de.a56b7a8d6de03f67b))

Sort Method: quicksort Memory: 25kB

-> HashAggregate (cost=752.59..771.94 rows=1290 width=26) (actual time=43.300..43.317 rows=5 loops=1)

-> Seq Scan on a2e9a7e9e257153de (cost=0.00..689.56 rows=12605 width=26) (actual time=0.031..26.723 rows=12605 loops=1)

Total runtime: 43.549 ms

explain analyze SELECT extract(year from a56b7a8d6de03f67b), count(a10e4ab8863c199f1), sum(count(a10e4ab8863c199f1)) OVER (ORDER BY count(a10e4ab8863c199f1)) FROM a2e9a7e9e257153de GROUP BY 1 ORDER BY 1;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------

Sort (cost=1382.39..1388.52 rows=2451 width=32) (actual time=44.229..44.230 rows=5 loops=1)

Sort Key: (date_part('year'::text, a56b7a8d6de03f67b))

Sort Method: quicksort Memory: 25kB

-> WindowAgg (cost=1195.39..1244.41 rows=2451 width=32) (actual time=44.171..44.208 rows=5 loops=1)

-> Sort (cost=1195.39..1201.52 rows=2451 width=32) (actual time=44.125..44.127 rows=5 loops=1)

Sort Key: (count(a10e4ab8863c199f1))

Sort Method: quicksort Memory: 25kB

-> HashAggregate (cost=1014.52..1057.41 rows=2451 width=32) (actual time=44.071..44.099 rows=5 loops=1)

-> Seq Scan on a2e9a7e9e257153de (cost=0.00..833.58 rows=24126 width=32) (actual time=0.032..26.683 rows=12605 loops=1)

Total runtime: 44.396 ms

Regards

Oliver Kohll

oliver@agilebase.co.uk / +44(0)7814 828608 / skype:okohll

www.agilebase.co.uk - software

www.gtwm.co.uk - company

Re: Cumulative count (running total) window fn

From

Alban Hertroys

Date:

30 April 2010, 07:07:20

On 29 Apr 2010, at 19:21, Oliver Kohll - Mailing Lists wrote:

> The two plans (note I've been rewriting the field names for readability until now but haven't here):
>
> explain analyze SELECT year, sum(c) over (order by year)
> FROM (
>   SELECT extract(year from a56b7a8d6de03f67b) AS year, count(a10e4ab8863c199f1) AS c
>   FROM a2e9a7e9e257153de
>   GROUP BY extract(year from a56b7a8d6de03f67b)
> ) as subq;

Oh my, how can you work with such column and table names? You and any colleagues you may have will probably appreciate
havinga few views over those tables that translate that gibberish to human readable stuff. You could go further and
makethose views updatable (by means of a few rules), but then you run the risk that colleagues start to hug you... 

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.

!DSPAM:737,4bdaabce10411378620886!