Thread: cumulative count

cumulative count

From
Carson Farmer
Date:
Hi list,

This is my first post to pgsql, so hopefully I'm not asking something
that has been answered a thousand time before. I've looked online, and
through the archives, but I haven't found anything that answers my
question specifically:

Say I have a table like this:

      date     |       user
------------------+---------------------
20050201   |       Bill
20050210   |       Steve
20050224   |       Sally
20050311   |       Martha
20050316   |       Ryan
20050322   |       Phil
20050330   |       William
20050415   |       Mary
20050428   |       Susan
20050503   |       Jim

and I want to run a query that returns a *count* of the number of users
*each month*, ordered by year and *month*, with an additional column
that is a *running total of the count*, as in:

     year        |    month  |    count    |   run_count
-------------------+----------------+----------------+-----------------
     2005       |      02       |      3          |         3
     2005       |      03       |      4          |         7
     2005       |      04       |      2          |         9
     2005       |      05       |      1          |        10

I can get almost everything I want with:

SELECT a.year, a.month, a.count, a.month_name, sum(b.count) AS total
FROM (SELECT EXTRACT(year from added_date) AS year,
EXTRACT(month FROM added_date) AS month,
TO_CHAR(added_date, 'Month') AS month_name,
COUNT(*) AS count FROM users_table GROUP BY 1, 2, 3) AS a,
(SELECT EXTRACT(year FROM added_date) AS year,
EXTRACT(month FROM added_date) AS month,
TO_CHAR(added_date, 'Month') AS month_name,
COUNT(*) AS count
FROM users_table GROUP BY 1, 2, 3) AS b
WHERE a.year >= b.year AND a.month >= b.month
GROUP BY 1, 2, 3, 4
ORDER BY a.year, a.month asc;

but I can't quite figure out the running total of the count. The above
example works right up to the end of the first year, then the values no
longer make sense. My guess is it's something to do with my WHERE
clause, but I can't think of a better way to do things.

Any ideas?

Cheers,

Carson




Re: cumulative count

From
Harald Fuchs
Date:
In article <49381902.7080209@gmail.com>,
Carson Farmer <carson.farmer@gmail.com> writes:

>      date     |       user
> ------------------+---------------------
> 20050201   |       Bill
> 20050210   |       Steve
> 20050224   |       Sally
> 20050311   |       Martha
> 20050316   |       Ryan
> 20050322   |       Phil
> 20050330   |       William
> 20050415   |       Mary
> 20050428   |       Susan
> 20050503   |       Jim

> and I want to run a query that returns a *count* of the number of
> users *each month*, ordered by year and *month*, with an additional
> column that is a *running total of the count*, as in:

>     year        |    month  |    count    |   run_count
> -------------------+----------------+----------------+-----------------
>     2005       |      02       |      3          |         3
> 2005       |      03       |      4          |         7
>     2005       |      04       |      2          |         9
>     2005       |      05       |      1          |        10

> I can get almost everything I want with:

> SELECT a.year, a.month, a.count, a.month_name, sum(b.count) AS total
> FROM (SELECT EXTRACT(year from added_date) AS year,
> EXTRACT(month FROM added_date) AS month,
> TO_CHAR(added_date, 'Month') AS month_name,
> COUNT(*) AS count FROM users_table GROUP BY 1, 2, 3) AS a,
> (SELECT EXTRACT(year FROM added_date) AS year,
> EXTRACT(month FROM added_date) AS month,
> TO_CHAR(added_date, 'Month') AS month_name,
> COUNT(*) AS count
> FROM users_table GROUP BY 1, 2, 3) AS b
> WHERE a.year >= b.year AND a.month >= b.month
> GROUP BY 1, 2, 3, 4
> ORDER BY a.year, a.month asc;

> but I can't quite figure out the running total of the count. The above
> example works right up to the end of the first year, then the values
> no longer make sense. My guess is it's something to do with my WHERE
> clause, but I can't think of a better way to do things.

Yes, your WHERE condition is the problem.  It should be
WHERE a.year > b.year OR a.year = b.year AND a.month > b.month.

You could simplify the date logic by doing the year/month split later, e.g.

  CREATE TEMP TABLE tmp AS
  SELECT date_trunc('month', date) AS dt, count(*) AS count
  FROM users_table
  GROUP BY dt;

  SELECT extract(YEAR FROM t1.dt) AS year,
         extract(MONTH FROM t1.dt) AS month,
         t1.count,
         sum(t2.count) AS run_count
  FROM tmp t1
  LEFT JOIN tmp t2 ON t2.dt <= t1.dt
  GROUP BY year, month, t1.count
  ORDER BY year, month;

(AFAIK the WITH clause in PostgreSQL 8.4 would let you get rid of
the temp table.)

Re: cumulative count

From
Gerhard Heift
Date:
On Thu, Dec 04, 2008 at 07:32:59PM +0100, Harald Fuchs wrote:
> In article <49381902.7080209@gmail.com>,
> Carson Farmer <carson.farmer@gmail.com> writes:
>
> >      date     |       user
> > ------------------+---------------------
> > 20050201   |       Bill
> > 20050210   |       Steve
> > 20050224   |       Sally
> > 20050311   |       Martha
> > 20050316   |       Ryan
> > 20050322   |       Phil
> > 20050330   |       William
> > 20050415   |       Mary
> > 20050428   |       Susan
> > 20050503   |       Jim
>
> > and I want to run a query that returns a *count* of the number of
> > users *each month*, ordered by year and *month*, with an additional
> > column that is a *running total of the count*, as in:
>
> >     year        |    month  |    count    |   run_count
> > -------------------+----------------+----------------+-----------------
> >     2005       |      02       |      3          |         3
> > 2005       |      03       |      4          |         7
> >     2005       |      04       |      2          |         9
> >     2005       |      05       |      1          |        10
>
> > I can get almost everything I want with:
>
> > SELECT a.year, a.month, a.count, a.month_name, sum(b.count) AS total
> > FROM (SELECT EXTRACT(year from added_date) AS year,
> > EXTRACT(month FROM added_date) AS month,
> > TO_CHAR(added_date, 'Month') AS month_name,
> > COUNT(*) AS count FROM users_table GROUP BY 1, 2, 3) AS a,
> > (SELECT EXTRACT(year FROM added_date) AS year,
> > EXTRACT(month FROM added_date) AS month,
> > TO_CHAR(added_date, 'Month') AS month_name,
> > COUNT(*) AS count
> > FROM users_table GROUP BY 1, 2, 3) AS b
> > WHERE a.year >= b.year AND a.month >= b.month
> > GROUP BY 1, 2, 3, 4
> > ORDER BY a.year, a.month asc;
>
> > but I can't quite figure out the running total of the count. The above
> > example works right up to the end of the first year, then the values
> > no longer make sense. My guess is it's something to do with my WHERE
> > clause, but I can't think of a better way to do things.
>
> Yes, your WHERE condition is the problem.  It should be
> WHERE a.year > b.year OR a.year = b.year AND a.month > b.month.
>
> You could simplify the date logic by doing the year/month split later, e.g.
>
>   CREATE TEMP TABLE tmp AS
>
>   SELECT extract(YEAR FROM t1.dt) AS year,
>          extract(MONTH FROM t1.dt) AS month,
>          t1.count,
>          sum(t2.count) AS run_count
>   FROM tmp t1
>   LEFT JOIN tmp t2 ON t2.dt <= t1.dt
>   GROUP BY year, month, t1.count
>   ORDER BY year, month;

What about:

SELECT extract(YEAR FROM t1.dt) AS year,
       extract(MONTH FROM t1.dt) AS month,
       t1.count,
       sum(t2.count) AS run_count
FROM (
    SELECT date_trunc('month', date) AS dt, count(*) AS count
    FROM users_table
    GROUP BY dt
) AS t1
LEFT JOIN tmp t2 ON t2.dt <= t1.dt
GROUP BY year, month, t1.count
ORDER BY year, month;

Regards,
  Gerhard

Attachment

Re: cumulative count

From
"Pavel Stehule"
Date:
2008/12/4 Carson Farmer <carson.farmer@gmail.com>:
> Hi list,
>
> This is my first post to pgsql, so hopefully I'm not asking something that
> has been answered a thousand time before. I've looked online, and through
> the archives, but I haven't found anything that answers my question
> specifically:
>
> Say I have a table like this:
>
>     date     |       user
> ------------------+---------------------
> 20050201   |       Bill
> 20050210   |       Steve
> 20050224   |       Sally
> 20050311   |       Martha
> 20050316   |       Ryan
> 20050322   |       Phil
> 20050330   |       William
> 20050415   |       Mary
> 20050428   |       Susan
> 20050503   |       Jim
>
> and I want to run a query that returns a *count* of the number of users
> *each month*, ordered by year and *month*, with an additional column that is
> a *running total of the count*, as in:
>
>    year        |    month  |    count    |   run_count
> -------------------+----------------+----------------+-----------------
>    2005       |      02       |      3          |         3            2005
>       |      03       |      4          |         7
>    2005       |      04       |      2          |         9
>    2005       |      05       |      1          |        10
>
> I can get almost everything I want with:
>
> SELECT a.year, a.month, a.count, a.month_name, sum(b.count) AS total
> FROM (SELECT EXTRACT(year from added_date) AS year,
> EXTRACT(month FROM added_date) AS month,
> TO_CHAR(added_date, 'Month') AS month_name,
> COUNT(*) AS count FROM users_table GROUP BY 1, 2, 3) AS a,
> (SELECT EXTRACT(year FROM added_date) AS year,
> EXTRACT(month FROM added_date) AS month,
> TO_CHAR(added_date, 'Month') AS month_name,
> COUNT(*) AS count
> FROM users_table GROUP BY 1, 2, 3) AS b
> WHERE a.year >= b.year AND a.month >= b.month
> GROUP BY 1, 2, 3, 4
> ORDER BY a.year, a.month asc;
>
> but I can't quite figure out the running total of the count. The above
> example works right up to the end of the first year, then the values no
> longer make sense. My guess is it's something to do with my WHERE clause,
> but I can't think of a better way to do things.
>
> Any ideas?
>

hate selfjoins. It is really slow for any bigger datasets. Write SRF
function (stored function that returns table).

Regards
Pavel Stehule


> Cheers,
>
> Carson
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: cumulative count

From
David Fetter
Date:
On Thu, Dec 04, 2008 at 05:53:06PM +0000, Carson Farmer wrote:
> Hi list,
>
> This is my first post to pgsql, so hopefully I'm not asking something
> that has been answered a thousand time before. I've looked online, and
> through the archives, but I haven't found anything that answers my
> question specifically:
>
> Say I have a table like this:
>
>      date     |       user
> ------------------+---------------------
> 20050201   |       Bill
> 20050210   |       Steve
> 20050224   |       Sally
> 20050311   |       Martha
> 20050316   |       Ryan
> 20050322   |       Phil
> 20050330   |       William
> 20050415   |       Mary
> 20050428   |       Susan
> 20050503   |       Jim
>
> and I want to run a query that returns a *count* of the number of users
> *each month*, ordered by year and *month*, with an additional column
> that is a *running total of the count*, as in:

In 8.4, you'll have direct SQL support for this using OLAP a.k.a.
windowing functions, so don't build too many of these dodgy hacks into
your application.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

tuples

From
MatT
Date:
Hi,

I have a question concerning psql. I found that psql has a defined
command '-t' and that it turns off printing of column names and result
row count footers, etc.

what I look for, is a command, which would turn off result row count
footer, but would print column names.

is there an easy way to do this?

regards,
Matt

----------------------------------------------------
Wirus Filipiński znowu atakuje
http://klik.wp.pl/?adr=http%3A%2F%2Fprorocznia.pl%2Ff.html%3Fi%3D37160O-937882398O0&sid=575



Re: tuples

From
"Harvey, Allan AC"
Date:
> I have a question concerning psql. I found that psql has a defined
> command '-t' and that it turns off printing of column names
> and result
> row count footers, etc.
>
> what I look for, is a command, which would turn off result row count
> footer, but would print column names.
>
> is there an easy way to do this?

Start psql with the -P footer switch. 8.3.3 works this way.

Allan


The material contained in this email may be confidential, privileged or copyrighted. If you are not the intended
recipient,use, disclosure or copying of this information is prohibited. If you have received this document in error,
pleaseadvise the sender and delete the document. Neither OneSteel nor the sender accept responsibility for any viruses
containedin this email or any attachments. 

Re: tuples

From
"MatT"
Date:
>> I have a question concerning psql. I found that psql has a defined
>> command '-t' and that it turns off printing of column names
>> and result
>> row count footers, etc.
>>
>> what I look for, is a command, which would turn off result row count
>> footer, but would print column names.
>>
>> is there an easy way to do this?

>Start psql with the -P footer switch. 8.3.3 works this way.
>
>Allan


Works:)
Thx a lot!

Regards
Matt