Thread: GROUP BY requirement

GROUP BY requirement

From
Bill Moseley
Date:
I'm wondering if adding a GROUP BY (as required by Postgres) will
change the results of a select on a view.

I have the following view which joins a "class" with a teacher.  A
teacher is a "person" and I have an "instructors" link table.

CREATE VIEW class_list  (id, class_time, instructor )
    AS
        SELECT DISTINCT ON(class.id)
               class.id, class.class_time, person.first_name

          FROM class, instructors, person
         WHERE instructors.person = person.id
           AND class.id = instructors.class;

I also have a table "registration" that links students with a class.
The registration table has a "reg_status" column to say if they are
confirmed or on the wait_list.  So when showing the above I'd also
like to see how many students are confirmed and on the wait_list.

DROP VIEW cl;
CREATE VIEW cl  (id, class_time, instructor,
                    confirmed_cnt, wait_list_cnt)
AS
    SELECT DISTINCT ON(class.id)
           class.id, class.class_time, person.first_name,
           sum (CASE WHEN registration.reg_status = 1 THEN 1 ELSE 0 END) as confirmed_cnt,
           sum (CASE WHEN registration.reg_status = 2 THEN 1 ELSE 0 END) as wait_list_cnt,

      FROM class, instructors, person, registration
     WHERE instructors.person = person.id
       AND class.id = instructors.class
       AND class.id = registration.class

  GROUP BY class.id, class.class_time, person.first_name;

PostgreSQL requires the GROUP BY.  But, I'm not clear how the GROUP BY
might change the results between the two views above.

  http://www.postgresql.org/docs/8.0/static/sql-select.html#SQL-GROUPBY

says:

    When GROUP BY is present, it is not valid for the SELECT list
    expressions to refer to ungrouped columns except within aggregate
    functions, since there would be more than one possible value to
    return for an ungrouped column.

Frankly, I cannot see how it might change results of a select between
the two views.  Am I missing something?



--
Bill Moseley
moseley@hank.org

Re: GROUP BY requirement

From
Scott Marlowe
Date:
On Fri, 2005-08-26 at 14:39, Bill Moseley wrote:
> I'm wondering if adding a GROUP BY (as required by Postgres) will
> change the results of a select on a view.
>
> I have the following view which joins a "class" with a teacher.  A
> teacher is a "person" and I have an "instructors" link table.
>
> CREATE VIEW class_list  (id, class_time, instructor )
>     AS
>         SELECT DISTINCT ON(class.id)
>                class.id, class.class_time, person.first_name
>
>           FROM class, instructors, person
>          WHERE instructors.person = person.id
>            AND class.id = instructors.class;
>
> I also have a table "registration" that links students with a class.
> The registration table has a "reg_status" column to say if they are
> confirmed or on the wait_list.  So when showing the above I'd also
> like to see how many students are confirmed and on the wait_list.
>
> DROP VIEW cl;
> CREATE VIEW cl  (id, class_time, instructor,
>                     confirmed_cnt, wait_list_cnt)
> AS
>     SELECT DISTINCT ON(class.id)
>            class.id, class.class_time, person.first_name,
>            sum (CASE WHEN registration.reg_status = 1 THEN 1 ELSE 0 END) as confirmed_cnt,
>            sum (CASE WHEN registration.reg_status = 2 THEN 1 ELSE 0 END) as wait_list_cnt,
>
>       FROM class, instructors, person, registration
>      WHERE instructors.person = person.id
>        AND class.id = instructors.class
>        AND class.id = registration.class
>
>   GROUP BY class.id, class.class_time, person.first_name;
>
> PostgreSQL requires the GROUP BY.  But, I'm not clear how the GROUP BY
> might change the results between the two views above.
>
>   http://www.postgresql.org/docs/8.0/static/sql-select.html#SQL-GROUPBY
>
> says:
>
>     When GROUP BY is present, it is not valid for the SELECT list
>     expressions to refer to ungrouped columns except within aggregate
>     functions, since there would be more than one possible value to
>     return for an ungrouped column.
>
> Frankly, I cannot see how it might change results of a select between
> the two views.  Am I missing something?

OK, distinct on suffers from this problem.  Given the following simple
dataset:

mytable:
a | b
------
1 | 0
1 | 1

select distinct on (a) a,b from mytable;

One can see how the possible results are:

1,0 and 1,1, right?  All depending on the order in which they are
fetched.

The same would be true if you could do a group by on a and select b:

select a,b from mytable group by a;

Right?

Now, if it's impossible for your dataset to return such sets, due to the
way it's built, it is likely not fully normalized.  I.e. you have data
like this:

classid | instructorname | moreinfo...
--------------------------------------
1 | 'John Smith' | 'information'
1 | 'John Smith' | 'even more information'

and so on.  Or your join is creating such a data set.