Thread: Cumulative count (running total) window fn
Hello, Many thanks to andreas.kretschmer for this helpful reply about how to set up a window function to perform a running total: http://archives.postgresql.org/pgsql-general/2010-03/msg01122.php It works perfectly with the simple test data but I've just got back to work, tried implementing it on my live data and theresults are slightly different. My query is almost exactly the same - I've simplified by grouping by year only ratherthan year and month: select extract(year from signup_date), count(email_address), sum(count(email_address)) over (rows unbounded preceding) from email_list group by 1 order by 1; date_part | count | sum -----------+-------+------ 2007 | 501 | 1374 2008 | 491 | 491 2009 | 382 | 873 2010 | 66 | 1440 (4 rows) What I'm looking for is date_part | count | sum -----------+-------+------ 2007 | 501 | 501 2008 | 491 | 992 2009 | 382 | 1374 2010 | 66 | 1440 It seems to be adding up the counts but not in the right order. I've also tried an explicit ORDER BY inside the partition with no difference: select extract(year from signup_date), count(email_address), sum(count(email_address)) over (partition by 1 order by 1 asc rows unbounded preceding) from email_list group by 1 order by 1; Does anyone have any other ideas? Regards Oliver Kohll oliver@agilebase.co.uk / +44(0)7814 828608 / skype:okohll www.agilebase.co.uk - software www.gtwm.co.uk - company
On Thu, Apr 29, 2010 at 10:52, Oliver Kohll - Mailing Lists <oliver.lists@gtwm.co.uk> wrote: > Hello, > > Many thanks to andreas.kretschmer for this helpful reply about how to set up a window function to perform a running total: > http://archives.postgresql.org/pgsql-general/2010-03/msg01122.php > > It works perfectly with the simple test data but I've just got back to work, tried implementing it on my live data andthe results are slightly different. My query is almost exactly the same - I've simplified by grouping by year only ratherthan year and month: > > select extract(year from signup_date), > count(email_address), > sum(count(email_address)) over (rows unbounded preceding) > from email_list group by 1 order by 1; > > date_part | count | sum > -----------+-------+------ > 2007 | 501 | 1374 > 2008 | 491 | 491 > 2009 | 382 | 873 > 2010 | 66 | 1440 > (4 rows) > > What I'm looking for is > date_part | count | sum > -----------+-------+------ > 2007 | 501 | 501 > 2008 | 491 | 992 > 2009 | 382 | 1374 > 2010 | 66 | 1440 > > It seems to be adding up the counts but not in the right order. > > I've also tried an explicit ORDER BY inside the partition with no difference: > > select extract(year from signup_date), > count(email_address), > sum(count(email_address)) over (partition by 1 order by 1 asc rows unbounded preceding) > from email_list group by 1 order by 1; > > Does anyone have any other ideas? Aren't you looking for something along the line of: SELECT year, sum(c) over (order by year) FROM ( SELECT extract(year from signup_date) AS year, count(email_address) AS c FROM email_list GROUP BY extract(year from signup_date) ) (adjust for typos, I didn't test it) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On 29 Apr 2010, at 10:01, Magnus Hagander wrote:
select extract(year from signup_date),count(email_address),sum(count(email_address)) over (partition by 1 order by 1 asc rows unbounded preceding)from email_list group by 1 order by 1;Does anyone have any other ideas?
Aren't you looking for something along the line of:
SELECT year, sum(c) over (order by year)
FROM (
SELECT extract(year from signup_date) AS year, count(email_address) AS c
FROM email_list
GROUP BY extract(year from signup_date)
)
(adjust for typos, I didn't test it)
Yes that does work thanks, if you give the subquery a name. I'd still like to know if it's possible to do with a window function rather than a subquery.
Oliver Kohll
On 29 April 2010 11:39, Oliver Kohll - Mailing Lists <oliver.lists@gtwm.co.uk> wrote:
On 29 Apr 2010, at 10:01, Magnus Hagander wrote:select extract(year from signup_date),count(email_address),sum(count(email_address)) over (partition by 1 order by 1 asc rows unbounded preceding)from email_list group by 1 order by 1;Does anyone have any other ideas?
Aren't you looking for something along the line of:
SELECT year, sum(c) over (order by year)
FROM (
SELECT extract(year from signup_date) AS year, count(email_address) AS c
FROM email_list
GROUP BY extract(year from signup_date)
)
(adjust for typos, I didn't test it)Yes that does work thanks, if you give the subquery a name. I'd still like to know if it's possible to do with a window function rather than a subquery.Oliver Kohll
Like this?:
SELECT extract(year from signup_date), count(email_address), sum(count(email_address)) OVER (ORDER BY count(email_address)) FROM email_list GROUP BY 1 ORDER BY 1;
Thom
Aren't you looking for something along the line of:
SELECT year, sum(c) over (order by year)
FROM (
SELECT extract(year from signup_date) AS year, count(email_address) AS c
FROM email_list
GROUP BY extract(year from signup_date)
)
(adjust for typos, I didn't test it)Yes that does work thanks, if you give the subquery a name. I'd still like to know if it's possible to do with a window function rather than a subquery.Oliver Kohll
Like this?:
SELECT extract(year from signup_date), count(email_address), sum(count(email_address)) OVER (ORDER BY count(email_address)) FROM email_list GROUP BY 1 ORDER BY 1;
Thom
Almost, but put me on the right track! This one is exactly what I'm looking for:
SELECT extract(year from signup_date), count(email_address), sum(count(email_address)) OVER (ORDER BY extract(year from signup_date)) FROM email_list GROUP BY 1 ORDER BY 1;
The ORDER BY count(email_address) did give the same results for my data but only because the count values just happen to give the same ordering as the years - I tested by changing some dates.
Many thanks all.
Oliver
On Thu, Apr 29, 2010 at 13:43, Oliver Kohll - Mailing Lists <oliver.lists@gtwm.co.uk> wrote: >> >> Aren't you looking for something along the line of: >> >> SELECT year, sum(c) over (order by year) >> FROM ( >> SELECT extract(year from signup_date) AS year, count(email_address) AS c >> FROM email_list >> GROUP BY extract(year from signup_date) >> ) >> >> (adjust for typos, I didn't test it) >> >> Yes that does work thanks, if you give the subquery a name. I'd still like >> to know if it's possible to do with a window function rather than a >> subquery. >> Oliver Kohll > > Like this?: > > SELECT extract(year from signup_date), count(email_address), > sum(count(email_address)) OVER (ORDER BY count(email_address)) FROM > email_list GROUP BY 1 ORDER BY 1; > > Thom > > Almost, but put me on the right track! This one is exactly what I'm looking > for: > SELECT extract(year from signup_date), count(email_address), > sum(count(email_address)) OVER (ORDER BY extract(year from signup_date)) > FROM email_list GROUP BY 1 ORDER BY 1; > The ORDER BY count(email_address) did give the same results for my data but > only because the count values just happen to give the same ordering as the > years - I tested by changing some dates. > Many thanks all. Curious note - how does the non-subselect version and the subselect version compare performance-wise? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Curious note - how does the non-subselect version and the subselect
version compare performance-wise?
Magnus,
On a test table with 12,000 rows there's not much in it, the subselect has a simpler plan but they both take practically the same time.
The two plans (note I've been rewriting the field names for readability until now but haven't here):
explain analyze SELECT year, sum(c) over (order by year)
FROM (
SELECT extract(year from a56b7a8d6de03f67b) AS year, count(a10e4ab8863c199f1) AS c
FROM a2e9a7e9e257153de
GROUP BY extract(year from a56b7a8d6de03f67b)
) as subq;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=851.49..874.06 rows=1290 width=16) (actual time=43.369..43.394 rows=5 loops=1)
-> Sort (cost=851.49..854.71 rows=1290 width=16) (actual time=43.340..43.342 rows=5 loops=1)
Sort Key: (date_part('year'::text, a2e9a7e9e257153de.a56b7a8d6de03f67b))
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=752.59..771.94 rows=1290 width=26) (actual time=43.300..43.317 rows=5 loops=1)
-> Seq Scan on a2e9a7e9e257153de (cost=0.00..689.56 rows=12605 width=26) (actual time=0.031..26.723 rows=12605 loops=1)
Total runtime: 43.549 ms
explain analyze SELECT extract(year from a56b7a8d6de03f67b), count(a10e4ab8863c199f1), sum(count(a10e4ab8863c199f1)) OVER (ORDER BY count(a10e4ab8863c199f1)) FROM a2e9a7e9e257153de GROUP BY 1 ORDER BY 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=1382.39..1388.52 rows=2451 width=32) (actual time=44.229..44.230 rows=5 loops=1)
Sort Key: (date_part('year'::text, a56b7a8d6de03f67b))
Sort Method: quicksort Memory: 25kB
-> WindowAgg (cost=1195.39..1244.41 rows=2451 width=32) (actual time=44.171..44.208 rows=5 loops=1)
-> Sort (cost=1195.39..1201.52 rows=2451 width=32) (actual time=44.125..44.127 rows=5 loops=1)
Sort Key: (count(a10e4ab8863c199f1))
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=1014.52..1057.41 rows=2451 width=32) (actual time=44.071..44.099 rows=5 loops=1)
-> Seq Scan on a2e9a7e9e257153de (cost=0.00..833.58 rows=24126 width=32) (actual time=0.032..26.683 rows=12605 loops=1)
Total runtime: 44.396 ms
Regards
Oliver Kohll
oliver@agilebase.co.uk / +44(0)7814 828608 / skype:okohll
www.agilebase.co.uk - software
www.gtwm.co.uk - company
On 29 Apr 2010, at 19:21, Oliver Kohll - Mailing Lists wrote: > The two plans (note I've been rewriting the field names for readability until now but haven't here): > > explain analyze SELECT year, sum(c) over (order by year) > FROM ( > SELECT extract(year from a56b7a8d6de03f67b) AS year, count(a10e4ab8863c199f1) AS c > FROM a2e9a7e9e257153de > GROUP BY extract(year from a56b7a8d6de03f67b) > ) as subq; Oh my, how can you work with such column and table names? You and any colleagues you may have will probably appreciate havinga few views over those tables that translate that gibberish to human readable stuff. You could go further and makethose views updatable (by means of a few rules), but then you run the risk that colleagues start to hug you... Alban Hertroys -- Screwing up is an excellent way to attach something to the ceiling. !DSPAM:737,4bdaabce10411378620886!