Re: Setting WHERE on a VIEW with aggregate function. - Mailing list pgsql-general

From Roger Hand
Subject Re: Setting WHERE on a VIEW with aggregate function.
Date
Msg-id DB28E9B548192448A4E8C8A3C1B1E475611CF5@sj1-exch-01.us.corp.kailea.com
Whole thread Raw
In response to Setting WHERE on a VIEW with aggregate function.  (Bill Moseley <moseley@hank.org>)
Responses Re: Setting WHERE on a VIEW with aggregate function.
List pgsql-general
I have a view to generate a list of instructors and a count of their
> future classes.
>
> "instructors" is a link table between "class" and "person".
>
> CREATE VIEW future_instructor_counts
>     AS
>         SELECT  person.id AS person_id,
>                 first_name,
>                 last_name,
>                 count(instructors.class) AS class_count
>
>           FROM  class, instructors, person
>
>          WHERE  class.id    = instructors.class AND
>                 person.id   = instructors.person
>                 AND class_time > now()
>
>       GROUP BY  person_id, first_name, last_name;

The trick is to do the data aggregation separately, then JOIN in whatever other fields you want.

Something like this:

CREATE VIEW future_instructor_counts
    AS
        SELECT  * FROM

    (SELECT
     person.id AS person_id,
                first_name,
                last_name) personinfo

    INNER JOIN

    (SELECT class.id FROM class
    WHERE class_time > now() ) classes

    INNER JOIN

    (SELECT
     id, count(class) AS class_count
    FROM instructors GROUP BY id) classcount

    ON personinfo.person_id = instructors.id
    AND classes.id = instructors.id

In many cases when using aggregate functions you get just the fields you need from the agg function (typically an id
plusthe aggregate result) and JOIN with other tables (or even the same table) to get other info such as first_name,
last_name,etc. 

Otherwise, if you GROUP BY additional fields so you can get them in the output, you may be making the db do additional
work.

> 1) With an aggregate function in the query, is there any way to remove
> the "AND class_time > now()" so that timestamp can be passed in the
> select?  That is, I'd like to be able to do this?
>
>     select * from instructor_counts where class_time > now();
>
> But class_time is not part of the VIEW so that's not valid.

No problem, just make it a part of the view. See the classes section below.

CREATE VIEW future_instructor_counts
    AS
        SELECT  * FROM

    (SELECT
     person.id AS person_id,
                first_name,
                last_name) personinfo

    INNER JOIN

    -- Add class_time field!
    (SELECT class.id, class_time FROM class
    WHERE class_time > now() ) classes

    INNER JOIN

    (SELECT
     id, count(class) AS class_count
    FROM instructors GROUP BY id) classcount

    ON personinfo.person_id = instructors.id
    AND classes.id = instructors.id

[Disclaimer: I've not tested this code at all. It could help if you sent table definitions and maybe even dummy
data via insert commands.]

>  And if it was included then I don't have an aggregate function any more - no
> more grouping.

If you do the agg function separately like this that isn't an issue. You join tables to get whatever fields you'd like
tohave in your output. 

> 2) I think I'm missing something obvious.  I know that I need to
> specify all my non-aggregate columns in the "GROUP BY", but I don't
> under stand why.  Really, the results are just grouped only by
> person.id so why the need to specify the other columns.
>
> And if you don't specify all the columns then Postgresql reports:
>
>   ERROR:  column "person.id" must appear in the GROUP BY
>             clause or be used in an aggregate function
>
> Is there a reason Postgresql doesn't just add the column
> automatically?  It does in other cases (like a missing table in a
> join).

As I mention above, if you GROUP BY additional fields just to get them in the output, you may be making the db do
additionalwork. 

I seem to remember that in a later SQL standard (ie, after SQL-99 but I could be wrong) I believe it allows you to
specifyadditional fields in SELECT that are not in the GROUP BY clause. But PG isn't there yet.  

-Roger

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Bill Moseley
Sent: Friday, September 16, 2005 11:30 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Setting WHERE on a VIEW with aggregate function.


I have a view to generate a list of instructors and a count of their
future classes.

"instructors" is a link table between "class" and "person".

CREATE VIEW future_instructor_counts
    AS
        SELECT  person.id AS person_id,
                first_name,
                last_name,
                count(instructors.class) AS class_count

          FROM  class, instructors, person

         WHERE  class.id    = instructors.class AND
                person.id   = instructors.person
                AND class_time > now()

      GROUP BY  person_id, first_name, last_name;


I have two very basic SQL questions:

1) With an aggregate function in the query, is there any way to remove
the "AND class_time > now()" so that timestamp can be passed in the
select?  That is, I'd like to be able to do this?

    select * from instructor_counts where class_time > now();

But class_time is not part of the VIEW so that's not valid.  And if it
was included then I don't have an aggregate function any more - no
more grouping.


2) I think I'm missing something obvious.  I know that I need to
specify all my non-aggregate columns in the "GROUP BY", but I don't
under stand why.  Really, the results are just grouped only by
person.id so why the need to specify the other columns.

And if you don't specify all the columns then Postgresql reports:

  ERROR:  column "person.id" must appear in the GROUP BY
            clause or be used in an aggregate function

Is there a reason Postgresql doesn't just add the column
automatically?  It does in other cases (like a missing table in a
join).

Thanks


--
Bill Moseley
moseley@hank.org


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

pgsql-general by date:

Previous
From: Greg Stark
Date:
Subject: Re: Asychronous database replication
Next
From: Fernando Lujan
Date:
Subject: Restoring just a table or row from a backup copy.