Re: monthly tally of new memberships - Mailing list pgsql-general

From brian
Subject Re: monthly tally of new memberships
Date
Msg-id 469E6FEC.2030709@zijn-digital.com
Whole thread Raw
In response to Re: monthly tally of new memberships  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-general
Alvaro Herrera wrote:
> brian wrote:
>
>>Michael Glaesemann wrote:
>>
>>>On Jul 18, 2007, at 13:29 , brian wrote:
>>>
>>>>This returns the new memberships for each day, ignoring days
>>>>without any. What i'd like to do though, is to select only the 1st
>>>>of each month, summing the new memberships or that month, eg:
>>>>month     |     applications 2006-05-01   |           57 2006-06-01
>>>>|           36 2006-07-01   |           72
>>>
>>>Try something like this:
>>>SELECT date_trunc('month', applied)::date AS date_applied , count(id)
>>>AS applications FROM member GROUP BY applied ORDER BY date_applied
>>>ASC;
>>>Note I remove the WHERE applied = applied, as this is just identity.
>>
>>Thanks, but that isn't it. I've tried that exact query, actually. The
>>problem with that is it doesn't give me one row for the entire month.
>>Instead, i get one row for each day there was a new membership, only the
>>date_applied column has been changed to the 1st of that particular month.
>>eg:
>
>
> Hum, you should be grouping by date_applied (also known as "group by 1"
> because you can't use column aliases in GROUP BY)
>

Right, that works, also.

I compared this to Jon Sime's suggestion:

test=# EXPLAIN ANALYZE  SELECT date_trunc('month', applied)::date AS
date_applied, count(id) AS applications FROM member GROUP BY 1 ORDER BY
date_applied ASC;

QUERY PLAN
-------------------------------------------------------
Sort  (cost=140.76..141.26 rows=200 width=8) (actual time=17.590..17.622
rows=18 loops=1)
Sort Key: (date_trunc('month'::text, (applied)::timestamp with time
zone))::date
    ->  HashAggregate  (cost=129.12..133.12 rows=200 width=8) (actual
time=17.478..17.523 rows=18 loops=1)
          ->  Seq Scan on member  (cost=0.00..123.76 rows=1072 width=8)
(actual time=0.035..10.684 rows=1072 loops=1)
  Total runtime: 17.733 ms
(5 rows)

test=# EXPLAIN ANALYZE SELECT to_char(applied, 'yyyy-mm') AS
month_applied, count(id) AS applications FROM member GROUP BY
to_char(applied, 'yyyy-mm') ORDER BY 1 ASC;

QUERY PLAN
-----------------------------------------------------
  Sort  (cost=137.58..138.08 rows=200 width=8) (actual
time=13.415..13.458 rows=18 loops=1)
    Sort Key: to_char((applied)::timestamp with time zone, 'yyyy-mm'::text)
    ->  HashAggregate  (cost=126.44..129.94 rows=200 width=8) (actual
time=13.273..13.314 rows=18 loops=1)
          ->  Seq Scan on member  (cost=0.00..121.08 rows=1072 width=8)
(actual time=0.042..10.525 rows=1072 loops=1)
  Total runtime: 13.564 ms
(5 rows)


But, getting back to your comment, i see that this (grouping by the
alias) also works:

test=# EXPLAIN ANALYZE SELECT to_char(applied, 'yyyy-mm') AS
month_applied, count(id) AS applications FROM member GROUP BY
month_applied ORDER BY 1 ASC;

QUERY PLAN
-----------------------------------------------------
  Sort  (cost=137.58..138.08 rows=200 width=8) (actual
time=44.329..44.363 rows=18 loops=1)
    Sort Key: to_char((applied)::timestamp with time zone, 'yyyy-mm'::text)
    ->  HashAggregate  (cost=126.44..129.94 rows=200 width=8) (actual
time=44.190..44.229 rows=18 loops=1)
          ->  Seq Scan on member  (cost=0.00..121.08 rows=1072 width=8)
(actual time=0.042..41.242 rows=1072 loops=1)
  Total runtime: 44.477 ms
(5 rows)


But it seems to take longer at the cost of keeping the query tidy.

brian

pgsql-general by date:

Previous
From: "Greg Sabino Mullane"
Date:
Subject: Re: DBI/DBD::Pg and transactions
Next
From: "Pat Maddox"
Date:
Subject: Need help optimizing this query