Thread: Setting WHERE on a VIEW with aggregate function.

Setting WHERE on a VIEW with aggregate function.

From
Bill Moseley
Date:
I have a view to generate a list of instructors and a count of their
future classes.

"instructors" is a link table between "class" and "person".

CREATE VIEW future_instructor_counts
    AS
        SELECT  person.id AS person_id,
                first_name,
                last_name,
                count(instructors.class) AS class_count

          FROM  class, instructors, person

         WHERE  class.id    = instructors.class AND
                person.id   = instructors.person
                AND class_time > now()

      GROUP BY  person_id, first_name, last_name;


I have two very basic SQL questions:

1) With an aggregate function in the query, is there any way to remove
the "AND class_time > now()" so that timestamp can be passed in the
select?  That is, I'd like to be able to do this?

    select * from instructor_counts where class_time > now();

But class_time is not part of the VIEW so that's not valid.  And if it
was included then I don't have an aggregate function any more - no
more grouping.


2) I think I'm missing something obvious.  I know that I need to
specify all my non-aggregate columns in the "GROUP BY", but I don't
under stand why.  Really, the results are just grouped only by
person.id so why the need to specify the other columns.

And if you don't specify all the columns then Postgresql reports:

  ERROR:  column "person.id" must appear in the GROUP BY
            clause or be used in an aggregate function

Is there a reason Postgresql doesn't just add the column
automatically?  It does in other cases (like a missing table in a
join).

Thanks


--
Bill Moseley
moseley@hank.org


Re: Setting WHERE on a VIEW with aggregate function.

From
"Roger Hand"
Date:
I have a view to generate a list of instructors and a count of their
> future classes.
>
> "instructors" is a link table between "class" and "person".
>
> CREATE VIEW future_instructor_counts
>     AS
>         SELECT  person.id AS person_id,
>                 first_name,
>                 last_name,
>                 count(instructors.class) AS class_count
>
>           FROM  class, instructors, person
>
>          WHERE  class.id    = instructors.class AND
>                 person.id   = instructors.person
>                 AND class_time > now()
>
>       GROUP BY  person_id, first_name, last_name;

The trick is to do the data aggregation separately, then JOIN in whatever other fields you want.

Something like this:

CREATE VIEW future_instructor_counts
    AS
        SELECT  * FROM

    (SELECT
     person.id AS person_id,
                first_name,
                last_name) personinfo

    INNER JOIN

    (SELECT class.id FROM class
    WHERE class_time > now() ) classes

    INNER JOIN

    (SELECT
     id, count(class) AS class_count
    FROM instructors GROUP BY id) classcount

    ON personinfo.person_id = instructors.id
    AND classes.id = instructors.id

In many cases when using aggregate functions you get just the fields you need from the agg function (typically an id
plusthe aggregate result) and JOIN with other tables (or even the same table) to get other info such as first_name,
last_name,etc. 

Otherwise, if you GROUP BY additional fields so you can get them in the output, you may be making the db do additional
work.

> 1) With an aggregate function in the query, is there any way to remove
> the "AND class_time > now()" so that timestamp can be passed in the
> select?  That is, I'd like to be able to do this?
>
>     select * from instructor_counts where class_time > now();
>
> But class_time is not part of the VIEW so that's not valid.

No problem, just make it a part of the view. See the classes section below.

CREATE VIEW future_instructor_counts
    AS
        SELECT  * FROM

    (SELECT
     person.id AS person_id,
                first_name,
                last_name) personinfo

    INNER JOIN

    -- Add class_time field!
    (SELECT class.id, class_time FROM class
    WHERE class_time > now() ) classes

    INNER JOIN

    (SELECT
     id, count(class) AS class_count
    FROM instructors GROUP BY id) classcount

    ON personinfo.person_id = instructors.id
    AND classes.id = instructors.id

[Disclaimer: I've not tested this code at all. It could help if you sent table definitions and maybe even dummy
data via insert commands.]

>  And if it was included then I don't have an aggregate function any more - no
> more grouping.

If you do the agg function separately like this that isn't an issue. You join tables to get whatever fields you'd like
tohave in your output. 

> 2) I think I'm missing something obvious.  I know that I need to
> specify all my non-aggregate columns in the "GROUP BY", but I don't
> under stand why.  Really, the results are just grouped only by
> person.id so why the need to specify the other columns.
>
> And if you don't specify all the columns then Postgresql reports:
>
>   ERROR:  column "person.id" must appear in the GROUP BY
>             clause or be used in an aggregate function
>
> Is there a reason Postgresql doesn't just add the column
> automatically?  It does in other cases (like a missing table in a
> join).

As I mention above, if you GROUP BY additional fields just to get them in the output, you may be making the db do
additionalwork. 

I seem to remember that in a later SQL standard (ie, after SQL-99 but I could be wrong) I believe it allows you to
specifyadditional fields in SELECT that are not in the GROUP BY clause. But PG isn't there yet.  

-Roger

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Bill Moseley
Sent: Friday, September 16, 2005 11:30 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Setting WHERE on a VIEW with aggregate function.


I have a view to generate a list of instructors and a count of their
future classes.

"instructors" is a link table between "class" and "person".

CREATE VIEW future_instructor_counts
    AS
        SELECT  person.id AS person_id,
                first_name,
                last_name,
                count(instructors.class) AS class_count

          FROM  class, instructors, person

         WHERE  class.id    = instructors.class AND
                person.id   = instructors.person
                AND class_time > now()

      GROUP BY  person_id, first_name, last_name;


I have two very basic SQL questions:

1) With an aggregate function in the query, is there any way to remove
the "AND class_time > now()" so that timestamp can be passed in the
select?  That is, I'd like to be able to do this?

    select * from instructor_counts where class_time > now();

But class_time is not part of the VIEW so that's not valid.  And if it
was included then I don't have an aggregate function any more - no
more grouping.


2) I think I'm missing something obvious.  I know that I need to
specify all my non-aggregate columns in the "GROUP BY", but I don't
under stand why.  Really, the results are just grouped only by
person.id so why the need to specify the other columns.

And if you don't specify all the columns then Postgresql reports:

  ERROR:  column "person.id" must appear in the GROUP BY
            clause or be used in an aggregate function

Is there a reason Postgresql doesn't just add the column
automatically?  It does in other cases (like a missing table in a
join).

Thanks


--
Bill Moseley
moseley@hank.org


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: Setting WHERE on a VIEW with aggregate function.

From
Bill Moseley
Date:
On Fri, Sep 16, 2005 at 12:06:19PM -0700, Roger Hand wrote:
> >     select * from instructor_counts where class_time > now();
> >
> > But class_time is not part of the VIEW so that's not valid.
>
> No problem, just make it a part of the view. See the classes section below.
>
> CREATE VIEW future_instructor_counts
>     AS
>         SELECT  * FROM
>
>     (SELECT
>      person.id AS person_id,
>                 first_name,
>                 last_name) personinfo
>
>     INNER JOIN
>
>     -- Add class_time field!
>     (SELECT class.id, class_time FROM class
>     WHERE class_time > now() ) classes
>
>     INNER JOIN
>
>     (SELECT
>      id, count(class) AS class_count
>     FROM instructors GROUP BY id) classcount
>
>     ON personinfo.person_id = instructors.id
>     AND classes.id = instructors.id

I couldn't get that to work -- Postgresql isn't that helpful just
reporting "ERROR:  syntax error at or near ";" at character 496" even
after adding a FROM in the first select.  So, I'm stabbing in the dark
to get it to work.

> [Disclaimer: I've not tested this code at all. It could help if you sent table definitions and maybe even dummy
> data via insert commands.]

Ok -- this should be cut-n-paste:

CREATE TABLE class (
    id          integer PRIMARY KEY,
    class_time  timestamp(0) with time zone,
    name        text
);

CREATE TABLE person (
    id          integer PRIMARY KEY,
    first_name  text
);

create table instructors (
    person              integer NOT NULL REFERENCES person,
    class               integer NOT NULL REFERENCES class,
    PRIMARY KEY  (person, class)
);

INSERT INTO person (id,first_name) values (1,'Joe');
INSERT INTO person (id,first_name) values (2,'Mary');
INSERT INTO person (id,first_name) values (3,'Bob');
INSERT INTO person (id,first_name) values (4,'Cindy');

INSERT INTO class (id,name, class_time) values (1,'Math', now());
INSERT INTO class (id,name, class_time) values (2,'Math', now() + interval '1 day');
INSERT INTO class (id,name, class_time) values (3,'Science', now());
INSERT INTO class (id,name, class_time) values (4,'PE', now() + interval '1 day');

INSERT INTO instructors (person, class) values (1,1);  -- joe teaches math now

INSERT INTO instructors (person, class) values (1,2);  -- joe teaches math tomorrow
INSERT INTO instructors (person, class) values (2,2);  --   with Mary

INSERT INTO instructors (person, class) values (3,3);  -- Bob teaches science now
INSERT INTO instructors (person, class) values (4,3);  -- Cindy teaches science tomorrow

-- view

CREATE VIEW instructor_counts
    AS
        SELECT  person.id AS person_id,
                first_name,
                count(instructors.class) AS class_count

          FROM  class, instructors, person

         WHERE  class.id    = instructors.class AND
                person.id   = instructors.person
                -- AND class_time > now()

      GROUP BY  person_id, first_name;


    select * from instructor_counts order by class_count desc;

-- Returns:

 person_id | first_name | class_count
-----------+------------+-------------
         1 | Joe        |           2
         2 | Mary       |           1
         3 | Bob        |           1
         4 | Cindy      |           1
(4 rows)

My GOAL above is to be able to add a WHERE class_time > $some_time.


Here's were I left off, which I never could get to work.
The individual selects work, but seems like I need to be say
c.class_id = i.class in addition.  But I can't even get
this without syntax errors:

CREATE VIEW instructor_counts
    AS
        SELECT  *

        FROM

            (SELECT person.id AS person_id, first_name
             FROM person) p

            INNER JOIN

            (SELECT class.id AS class_id, class_time
             FROM class) c

            INNER JOIN

            (SELECT person, count(class) AS class_count
             FROM instructors GROUP BY person) i

        ON ( p.person_id = i.person);


That also looks like the selects are going to be full table scans.




--
Bill Moseley
moseley@hank.org


Re: Setting WHERE on a VIEW with aggregate function.

From
"Roger Hand"
Date:
>> > I have a view to generate a list of instructors and a count of their
>> > future classes.

>> >     select * from instructor_counts where class_time > now();
>> >
>> > But class_time is not part of the VIEW so that's not valid.
>>
>> No problem, just make it a part of the view.

"No problem", I said ... famous last words.

>     select * from instructor_counts order by class_count desc;
>
> -- Returns:
>
>  person_id | first_name | class_count
> -----------+------------+-------------
>          1 | Joe        |           2
>          2 | Mary       |           1
>          3 | Bob        |           1
>          4 | Cindy      |           1
> (4 rows)
>
> My GOAL above is to be able to add a WHERE class_time > $some_time.

Armed with your table ddl and sample data I can see the problem more clearly.

Unfortunately class_time cannot be a column in the view output.  For example, look at the "Joe" line above ... if he
teachestwo classes which "class_time" would it show?  

Since class_time can't be a column in the view output it can't be used in a WHERE clause.

So it would appear to me that you won't able to meet your goal by simply using a view.

However, there are other solutions you can use which may or may not be appropriate. I can think of three at the moment.

#1: Function Solution:
================
To use functions you may first need to run this at command line:

createlang plpgsql electric


Then create a function that you can pass in a date to:

CREATE FUNCTION getclasscount("timestamp") RETURNS "refcursor" AS '
DECLARE curs refcursor;
BEGIN
OPEN curs FOR
SELECT  * FROM
  (SELECT
   person, count(class) AS class_count
  FROM instructors
  INNER JOIN class
    ON class.id = instructors.person
  WHERE class.class_time > $1
 GROUP BY person
  ) classcount

  INNER JOIN

  (SELECT
   person.id AS person_id,
                first_name FROM person
   ) personinfo

  ON personinfo.person_id = classcount.person
RETURN curs;
END;
' LANGUAGE 'plpgsql';

This would be the best solution if you are in control of the application source code. In Java, for example, it's
relativelysimple to call this function and return the result as a result set. If you're working in Java I'd be glad to
showyou same sample code. 

If you really do need a view for some reason, then this wouldn't work.

#2: Simplify the Select Criteria Solution:
=============================

A slightly less flexible approach, but one that may be workable, would be to add two boolean columns (with default
valueof '0') to the class table: "completed" and "current". Then once a semester you run a simple query that updates
them.Something like: 

UPDATE class SET current = '1' where class_time = '2005-09-01';
UPDATE class SET completed = '1' where class_time < '2005-09-01';

Then the view would be:

CREATE VIEW vclasscount AS
SELECT  * FROM
  (SELECT
   person, count(class) AS class_count
  FROM instructors
  INNER JOIN
  (SELECT id FROM class
  WHERE class.completed = '0'
    AND class.current = '0') futureclasses
    ON futureclasses.id = instructors.class
 GROUP BY person
  ) classcount

  INNER JOIN

  (SELECT
   person.id AS person_id,
                first_name FROM person
   ) personinfo

  ON personinfo.person_id = classcount.person

This would be a nice simple solution, but may not be flexible enough in that you can't specify an ad-hoc date or
date-range... you would only see future class count. 

3: Use a Temp Table
===============
Again, if you have control of application logic, you could:

1) SELECT * INTO futureclasses FROM class where class_time > ?

2) Then make the view against futureclasses rather than classes.

Good luck!

-Roger

> Bill Moseley


Re: Setting WHERE on a VIEW with aggregate function.

From
Bill Moseley
Date:
On Fri, Sep 16, 2005 at 04:56:25PM -0700, Roger Hand wrote:
> So it would appear to me that you won't able to meet your goal by simply using a view.
>
> However, there are other solutions you can use which may or may not be appropriate. I can think of three at the
moment.
>
> #1: Function Solution:

I'll take a look at this.  It's about time I started to learn about
functions a bit.

> This would be the best solution if you are in control of the
> application source code. In Java, for example, it's relatively
> simple to call this function and return the result as a result set.
> If you're working in Java I'd be glad to show you same sample code.

I'm using Perl and DBI (really Class::DBI but DBI is not far away).

> #2: Simplify the Select Criteria Solution:
> =============================
>
> A slightly less flexible approach, but one that may be workable, would be to add two boolean columns (with default
valueof '0') to the class table: "completed" and "current". Then once a semester you run a simple query that updates
them.Something like: 
>
> UPDATE class SET current = '1' where class_time = '2005-09-01';
> UPDATE class SET completed = '1' where class_time < '2005-09-01';

Classes start daily (and at different hours).  I could cron once an
hour I suppose, but I'd rather not de-normalize the data.

Maybe I can just create three views (future, recent, old) and live
with that.

The temporary table is another possibility I'll look into.

Thank you very much for spending time on this.  I really appreciate
it.


--
Bill Moseley
moseley@hank.org


Re: Setting WHERE on a VIEW with aggregate function.

From
Tom Lane
Date:
Bill Moseley <moseley@hank.org> writes:
> I couldn't get that to work -- Postgresql isn't that helpful just
> reporting "ERROR:  syntax error at or near ";" at character 496" even
> after adding a FROM in the first select.  So, I'm stabbing in the dark
> to get it to work.

[ not directly on topic, but... ]  I think the answer to that is "use a
newer version of Postgres".  8.0 and up provide fairly specific pointers
for syntax errors.  Silly example:

regression=# select foo, bar
regression-# , baz,
regression-# zip zap zot
regression-# blah blah;
ERROR:  syntax error at or near "zap" at character 28
LINE 3: zip zap zot
            ^
regression=#

            regards, tom lane