Thread: generate_series with left join

generate_series with left join

From
"Pedro B."
Date:
Greetings.

I'm having some difficulties with my first use of the generate_series 
function.

Situation:cause                | integer                       date                 | timestamp(2) without time zone

cause | date
------+------------+-----------+ 
1     | 2006-03-23 15:07:53.63 |
2     | 2006-02-02 12:13:23.11 |
2     | 2006-11-12 16:43:11.45 |
1     | 2005-03-13 18:34:44.13 |
3     | 2006-01-23 11:24:41.31 |
(etc)

What i need to do, is to count the 'cause' column for the values '1' and
'2', and group them by year, using left joins in order to also have the
serialized years with empty values in the output.

My needed output for a series of (2005,2007) would be:year | one  | two
------+------+------2005 |    1 |    02006 |    1 |    22007 |    0 |    0


I have tried something like

#select s, (select count(cause) from mytable where cause=1 ) as one,
COUNT (cause) as two from generate_series(2006,2009) AS s(d) left JOIN
mytable o ON (substr(o.date,1,4) = s.d and cause=2) GROUP BY s.d ORDER
BY 1;

which obviously is wrong, because of the results: s   | one  | two
------+------+------2006 | 3769 | 16582007 | 3769 |    02008 | 3769 |    02009 | 3769 |    0

As far as the 'two', the left join was successful, however i can not
find a way to join the 'one'. The output value is correct, but the
result shown should be only for the year 2006, not for all the values of
the series.
Maybe i've looked at it TOO much or maybe i'm completely failing to find
a working logic. 
Any suggestions?

Any and all help is humbly appreciated.

\\pb


-- 
This message has been scanned for viruses and
dangerous content at MsgLab.com and is
believed to be clean.



Re: generate_series with left join

From
"Aaron Bono"
Date:
How about one of these two:

select
    year_list.year,
    count(one.*),
    count(two.*)
from (
    select years
    from generate_series(2006,2009) as years
) year_list
left outer join mytable as one on (
    date_part('year', one.date) = year_list.years
    and one.cause = 1
)
left outer join mytable as two on (
    date_part('year', two.date) = year_list.years
    and two.cause = 2
)
group by
    year_list.year
;


select
    year_list.year,
    mytable.cause,
    count(mytable.*)
from (
    select years
    from generate_series(2006,2009) as years
) year_list
left outer join mytable on (
    date_part('year', mytable.date) = year_list.years
)
group by
    year_list.year,
    mytable.cause
;

I think one of the problems many people have is the writing of their SQL in paragraph form.  It makes the SQL really hard to read and even harder to understand and debug.  Formatting your SQL like I did above may make it easier to see what is wrong.

-Aaron Bono

On 6/28/06, Pedro B. <pedro.borracha@msglab.com> wrote:
Greetings.

I'm having some difficulties with my first use of the generate_series
function.

Situation:
cause                | integer
date                 | timestamp(2) without time zone

cause | date
------+------------+-----------+
1     | 2006-03-23 15:07:53.63 |
2     | 2006-02-02 12:13:23.11 |
2     | 2006-11-12 16:43:11.45 |
1     | 2005-03-13 18:34:44.13 |
3     | 2006-01-23 11:24:41.31 |
(etc)

What i need to do, is to count the 'cause' column for the values '1' and
'2', and group them by year, using left joins in order to also have the
serialized years with empty values in the output.

My needed output for a series of (2005,2007) would be:
year | one  | two
------+------+------
2005 |    1 |    0
2006 |    1 |    2
2007 |    0 |    0


I have tried something like

#select s, (select count(cause) from mytable where cause=1 ) as one,
COUNT (cause) as two from generate_series(2006,2009) AS s(d) left JOIN
mytable o ON (substr(o.date,1,4) = s.d and cause=2) GROUP BY s.d ORDER
BY 1;

which obviously is wrong, because of the results:
  s   | one  | two
------+------+------
2006 | 3769 | 1658
2007 | 3769 |    0
2008 | 3769 |    0
2009 | 3769 |    0

As far as the 'two', the left join was successful, however i can not
find a way to join the 'one'. The output value is correct, but the
result shown should be only for the year 2006, not for all the values of
the series.
Maybe i've looked at it TOO much or maybe i'm completely failing to find
a working logic.
Any suggestions?

Any and all help is humbly appreciated.

\\pb



Re: generate_series with left join

From
"Pedro B."
Date:
On Wed, 2006-06-28 at 13:34 -0500, Aaron Bono wrote:

> select
>     year_list.year,
>     count(one.*),
>     count(two.*)
> from (
>     select years
>     from generate_series(2006,2009) as years
> ) year_list
> left outer join mytable as one on ( 
>     date_part('year', one.date) = year_list.years
>     and one.cause = 1
> )
> left outer join mytable as two on (
>     date_part('year', two.date) = year_list.years
>     and two.cause = 2
> )
> group by
>     year_list.year
> ;
> 
> 
> select
>     year_list.year,
>     mytable.cause,
>     count(mytable.*)
> from (
>     select years
>     from generate_series(2006,2009) as years
> ) year_list
> left outer join mytable on ( 
>     date_part('year', mytable.date) = year_list.years
> )
> group by
>     year_list.year,
>     mytable.cause
> ;
> 
Aaron, 
Thank you so much for your reply.
However, the 2 examples you provided have "weird" outputs:

The first:years |  count  |  count
-------+---------+--------- 2009 |       0 |       0 2008 |       0 |       0 2007 |       0 |       0 2006 | 7802080 |
7802080
(4 rows)

Time: 87110.753 ms  << yay.


The second:
years | cause | count
-------+---------+------- 2009 |         |     0 2008 |         |     0 2007 |         |     0 2006 |       6 |     1
2006|       1 |  4030 2006 |       2 |  1936 2006 |       3 |  4078 2006 |     100 |  3159 2006 |      98 |  2659 2006
|     99 |  2549
 

My need is really to only group the counts of where cause=1 and cause=2
for each year, none of the others.

> I think one of the problems many people have is the writing of their
> SQL in paragraph form.  It makes the SQL really hard to read and even
> harder to understand and debug.  Formatting your SQL like I did above
> may make it easier to see what is wrong. 

Indeed. Note taken, i'll improve my formatting.

\\pb


-- 
This message has been scanned for viruses and
dangerous content at MsgLab.com and is
believed to be clean.



Re: generate_series with left join

From
"Aaron Bono"
Date:
Sorry, I think I see the mistake - it is getting all the rows for 1 and all the rows for 2 and joining them.  Try splitting up the query into two inner queries like so:

select
    one_list.year,
    one_list.one_count,
    two_list.two_count
FROM
(
    select
        year_list.year,
        count(one.*) as one_count
    from (
        select years
        from generate_series(2006,2009) as years
    ) year_list
    left outer join mytable as one on (
        date_part('year', one.date) = year_list.years
        and one.cause = 1
    )
    group by
        year_list.year
) one_list,
(
    select
        year_list.year,
        count(two.*) as two_count
    from (
        select years
        from generate_series(2006,2009) as years
    ) year_list
    left outer join mytable as two on (
        date_part('year', two.date ) = year_list.years
        and two.cause = 2
    )
    group by
        year_list.year
) two_list
WHERE one_list.year = two_list.year
;

On 6/28/06, Pedro B. <pedro.borracha@msglab.com> wrote:
On Wed, 2006-06-28 at 13:34 -0500, Aaron Bono wrote:

> select
>     year_list.year,
>     count(one.*),
>     count(two.*)
> from (
>     select years
>     from generate_series(2006,2009) as years
> ) year_list
> left outer join mytable as one on (
>     date_part('year', one.date) = year_list.years
>     and one.cause = 1
> )
> left outer join mytable as two on (
>     date_part('year', two.date) = year_list.years
>     and two.cause = 2
> )
> group by
>     year_list.year
> ;
>
>
> select
>     year_list.year,
>     mytable.cause,
>     count(mytable.*)
> from (
>     select years
>     from generate_series(2006,2009) as years
> ) year_list
> left outer join mytable on (
>     date_part('year', mytable.date) = year_list.years
> )
> group by
>     year_list.year,
>     mytable.cause
> ;
>
Aaron,
Thank you so much for your reply.
However, the 2 examples you provided have "weird" outputs:

The first:
years |  count  |  count
-------+---------+---------
  2009 |       0 |       0
  2008 |       0 |       0
  2007 |       0 |       0
  2006 | 7802080 | 7802080
(4 rows)

Time: 87110.753 ms  << yay.


The second:

years | cause | count
-------+---------+-------
  2009 |         |     0
  2008 |         |     0
  2007 |         |     0
  2006 |       6 |     1
  2006 |       1 |  4030
  2006 |       2 |  1936
  2006 |       3 |  4078
  2006 |     100 |  3159
  2006 |      98 |  2659
  2006 |      99 |  2549

My need is really to only group the counts of where cause=1 and cause=2
for each year, none of the others.

> I think one of the problems many people have is the writing of their
> SQL in paragraph form.  It makes the SQL really hard to read and even
> harder to understand and debug.  Formatting your SQL like I did above
> may make it easier to see what is wrong.

Indeed. Note taken, i'll improve my formatting.

\\pb

Re: generate_series with left join

From
"Aaron Bono"
Date:
This should work too:

select
    year_list.year,
    one_list.one_count,
    two_list.two_count
FROM (
    select years
    from generate_series(2006,2009) as years
) year_list
left outer join (
    select
        date_part('year', one.date) as one_year,
        count(one.*) as one_count
    from mytable as one
    where one.cause = 1
    group by
        date_part('year', one.date)
) one_list on (year_list.years = one_year)
left outer join (
    select
        date_part('year', two.date) as two_year,
        count(two.*) as two_count
    from mytable as two
    where two.cause = 2
    group by
        date_part('year', two.date)
) two_list on (year_list.years = two_year)
;

On 6/28/06, Aaron Bono <postgresql@aranya.com > wrote:
Sorry, I think I see the mistake - it is getting all the rows for 1 and all the rows for 2 and joining them.  Try splitting up the query into two inner queries like so:

select
    one_list.year,
    one_list.one_count,
    two_list.two_count
FROM
(
    select
        year_list.year,
        count(one.*) as one_count

    from (
        select years
        from generate_series(2006,2009) as years
    ) year_list
    left outer join mytable as one on (
        date_part('year', one.date) = year_list.years
        and one.cause = 1
    )
    group by
        year_list.year
) one_list,
(
    select
        year_list.year,
        count(two.*) as two_count

    from (
        select years
        from generate_series(2006,2009) as years
    ) year_list
    left outer join mytable as two on (
        date_part('year', two.date ) = year_list.years
        and two.cause = 2
    )
    group by
        year_list.year
) two_list
WHERE one_list.year = two_list.year
;

On 6/28/06, Pedro B. <pedro.borracha@msglab.com> wrote:
On Wed, 2006-06-28 at 13:34 -0500, Aaron Bono wrote:

> select
>     year_list.year,
>     count(one.*),
>     count(two.*)
> from (
>     select years
>     from generate_series(2006,2009) as years
> ) year_list
> left outer join mytable as one on (
>     date_part('year', one.date) = year_list.years
>     and one.cause = 1
> )
> left outer join mytable as two on (
>     date_part('year', two.date) = year_list.years
>     and two.cause = 2
> )
> group by
>     year_list.year
> ;
>
>
> select
>     year_list.year,
>     mytable.cause,
>     count(mytable.*)
> from (
>     select years
>     from generate_series(2006,2009) as years
> ) year_list
> left outer join mytable on (
>     date_part('year', mytable.date) = year_list.years
> )
> group by
>     year_list.year,
>     mytable.cause
> ;
>
Aaron,
Thank you so much for your reply.
However, the 2 examples you provided have "weird" outputs:

The first:
years |  count  |  count
-------+---------+---------
  2009 |       0 |       0
  2008 |       0 |       0
  2007 |       0 |       0
  2006 | 7802080 | 7802080
(4 rows)

Time: 87110.753 ms  << yay.


The second:

years | cause | count
-------+---------+-------
  2009 |         |     0
  2008 |         |     0
  2007 |         |     0
  2006 |       6 |     1
  2006 |       1 |  4030
  2006 |       2 |  1936
  2006 |       3 |  4078
  2006 |     100 |  3159
  2006 |      98 |  2659
  2006 |      99 |  2549

My need is really to only group the counts of where cause=1 and cause=2
for each year, none of the others.

> I think one of the problems many people have is the writing of their
> SQL in paragraph form.  It makes the SQL really hard to read and even
> harder to understand and debug.  Formatting your SQL like I did above
> may make it easier to see what is wrong.

Indeed. Note taken, i'll improve my formatting.

\\pb

Re: generate_series with left join

From
"Pedro B."
Date:
On Wed, 2006-06-28 at 15:16 -0500, Aaron Bono wrote:
> This should work too:
> 
> select
>     year_list.year,
>     one_list.one_count,
>     two_list.two_count
> FROM (
>     select years
>     from generate_series(2006,2009) as years
> ) year_list
> left outer join ( 
>     select
>         date_part('year', one.date) as one_year,
>         count(one.*) as one_count
>     from mytable as one
>     where one.cause = 1
>     group by
>         date_part('year', one.date)
> ) one_list on (year_list.years = one_year) 
> left outer join (
>     select
>         date_part('year', two.date) as two_year,
>         count(two.*) as two_count
>     from mytable as two
>     where two.cause = 2
>     group by
>         date_part('year', two.date)
> ) two_list on (year_list.years = two_year)
> ;


Aaron, 

I confess i will take some time to digest the amazing code you just
sent, but in the meantime, let me tell you right away that both work
just as i needed.

I will stop pulling my hairs now.
Thank you so much. 

\\pb


-- 
This message has been scanned for viruses and
dangerous content at MsgLab.com and is
believed to be clean.