Re: GROUP BY requirement - Mailing list pgsql-general

From Scott Marlowe
Subject Re: GROUP BY requirement
Date
Msg-id 1125426924.28179.78.camel@state.g2switchworks.com
Whole thread Raw
In response to GROUP BY requirement  (Bill Moseley <moseley@hank.org>)
List pgsql-general
On Fri, 2005-08-26 at 14:39, Bill Moseley wrote:
> I'm wondering if adding a GROUP BY (as required by Postgres) will
> change the results of a select on a view.
>
> I have the following view which joins a "class" with a teacher.  A
> teacher is a "person" and I have an "instructors" link table.
>
> CREATE VIEW class_list  (id, class_time, instructor )
>     AS
>         SELECT DISTINCT ON(class.id)
>                class.id, class.class_time, person.first_name
>
>           FROM class, instructors, person
>          WHERE instructors.person = person.id
>            AND class.id = instructors.class;
>
> I also have a table "registration" that links students with a class.
> The registration table has a "reg_status" column to say if they are
> confirmed or on the wait_list.  So when showing the above I'd also
> like to see how many students are confirmed and on the wait_list.
>
> DROP VIEW cl;
> CREATE VIEW cl  (id, class_time, instructor,
>                     confirmed_cnt, wait_list_cnt)
> AS
>     SELECT DISTINCT ON(class.id)
>            class.id, class.class_time, person.first_name,
>            sum (CASE WHEN registration.reg_status = 1 THEN 1 ELSE 0 END) as confirmed_cnt,
>            sum (CASE WHEN registration.reg_status = 2 THEN 1 ELSE 0 END) as wait_list_cnt,
>
>       FROM class, instructors, person, registration
>      WHERE instructors.person = person.id
>        AND class.id = instructors.class
>        AND class.id = registration.class
>
>   GROUP BY class.id, class.class_time, person.first_name;
>
> PostgreSQL requires the GROUP BY.  But, I'm not clear how the GROUP BY
> might change the results between the two views above.
>
>   http://www.postgresql.org/docs/8.0/static/sql-select.html#SQL-GROUPBY
>
> says:
>
>     When GROUP BY is present, it is not valid for the SELECT list
>     expressions to refer to ungrouped columns except within aggregate
>     functions, since there would be more than one possible value to
>     return for an ungrouped column.
>
> Frankly, I cannot see how it might change results of a select between
> the two views.  Am I missing something?

OK, distinct on suffers from this problem.  Given the following simple
dataset:

mytable:
a | b
------
1 | 0
1 | 1

select distinct on (a) a,b from mytable;

One can see how the possible results are:

1,0 and 1,1, right?  All depending on the order in which they are
fetched.

The same would be true if you could do a group by on a and select b:

select a,b from mytable group by a;

Right?

Now, if it's impossible for your dataset to return such sets, due to the
way it's built, it is likely not fully normalized.  I.e. you have data
like this:

classid | instructorname | moreinfo...
--------------------------------------
1 | 'John Smith' | 'information'
1 | 'John Smith' | 'even more information'

and so on.  Or your join is creating such a data set.

pgsql-general by date:

Previous
From: Andrew Sullivan
Date:
Subject: Re: Postgresql replication
Next
From: Scott Marlowe
Date:
Subject: Re: POSS. FEATURE REQ: "Dynamic" Views