Thread: Setting WHERE on a VIEW with aggregate function.
I have a view to generate a list of instructors and a count of their future classes. "instructors" is a link table between "class" and "person". CREATE VIEW future_instructor_counts AS SELECT person.id AS person_id, first_name, last_name, count(instructors.class) AS class_count FROM class, instructors, person WHERE class.id = instructors.class AND person.id = instructors.person AND class_time > now() GROUP BY person_id, first_name, last_name; I have two very basic SQL questions: 1) With an aggregate function in the query, is there any way to remove the "AND class_time > now()" so that timestamp can be passed in the select? That is, I'd like to be able to do this? select * from instructor_counts where class_time > now(); But class_time is not part of the VIEW so that's not valid. And if it was included then I don't have an aggregate function any more - no more grouping. 2) I think I'm missing something obvious. I know that I need to specify all my non-aggregate columns in the "GROUP BY", but I don't under stand why. Really, the results are just grouped only by person.id so why the need to specify the other columns. And if you don't specify all the columns then Postgresql reports: ERROR: column "person.id" must appear in the GROUP BY clause or be used in an aggregate function Is there a reason Postgresql doesn't just add the column automatically? It does in other cases (like a missing table in a join). Thanks -- Bill Moseley moseley@hank.org
I have a view to generate a list of instructors and a count of their > future classes. > > "instructors" is a link table between "class" and "person". > > CREATE VIEW future_instructor_counts > AS > SELECT person.id AS person_id, > first_name, > last_name, > count(instructors.class) AS class_count > > FROM class, instructors, person > > WHERE class.id = instructors.class AND > person.id = instructors.person > AND class_time > now() > > GROUP BY person_id, first_name, last_name; The trick is to do the data aggregation separately, then JOIN in whatever other fields you want. Something like this: CREATE VIEW future_instructor_counts AS SELECT * FROM (SELECT person.id AS person_id, first_name, last_name) personinfo INNER JOIN (SELECT class.id FROM class WHERE class_time > now() ) classes INNER JOIN (SELECT id, count(class) AS class_count FROM instructors GROUP BY id) classcount ON personinfo.person_id = instructors.id AND classes.id = instructors.id In many cases when using aggregate functions you get just the fields you need from the agg function (typically an id plusthe aggregate result) and JOIN with other tables (or even the same table) to get other info such as first_name, last_name,etc. Otherwise, if you GROUP BY additional fields so you can get them in the output, you may be making the db do additional work. > 1) With an aggregate function in the query, is there any way to remove > the "AND class_time > now()" so that timestamp can be passed in the > select? That is, I'd like to be able to do this? > > select * from instructor_counts where class_time > now(); > > But class_time is not part of the VIEW so that's not valid. No problem, just make it a part of the view. See the classes section below. CREATE VIEW future_instructor_counts AS SELECT * FROM (SELECT person.id AS person_id, first_name, last_name) personinfo INNER JOIN -- Add class_time field! (SELECT class.id, class_time FROM class WHERE class_time > now() ) classes INNER JOIN (SELECT id, count(class) AS class_count FROM instructors GROUP BY id) classcount ON personinfo.person_id = instructors.id AND classes.id = instructors.id [Disclaimer: I've not tested this code at all. It could help if you sent table definitions and maybe even dummy data via insert commands.] > And if it was included then I don't have an aggregate function any more - no > more grouping. If you do the agg function separately like this that isn't an issue. You join tables to get whatever fields you'd like tohave in your output. > 2) I think I'm missing something obvious. I know that I need to > specify all my non-aggregate columns in the "GROUP BY", but I don't > under stand why. Really, the results are just grouped only by > person.id so why the need to specify the other columns. > > And if you don't specify all the columns then Postgresql reports: > > ERROR: column "person.id" must appear in the GROUP BY > clause or be used in an aggregate function > > Is there a reason Postgresql doesn't just add the column > automatically? It does in other cases (like a missing table in a > join). As I mention above, if you GROUP BY additional fields just to get them in the output, you may be making the db do additionalwork. I seem to remember that in a later SQL standard (ie, after SQL-99 but I could be wrong) I believe it allows you to specifyadditional fields in SELECT that are not in the GROUP BY clause. But PG isn't there yet. -Roger -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Bill Moseley Sent: Friday, September 16, 2005 11:30 AM To: pgsql-general@postgresql.org Subject: [GENERAL] Setting WHERE on a VIEW with aggregate function. I have a view to generate a list of instructors and a count of their future classes. "instructors" is a link table between "class" and "person". CREATE VIEW future_instructor_counts AS SELECT person.id AS person_id, first_name, last_name, count(instructors.class) AS class_count FROM class, instructors, person WHERE class.id = instructors.class AND person.id = instructors.person AND class_time > now() GROUP BY person_id, first_name, last_name; I have two very basic SQL questions: 1) With an aggregate function in the query, is there any way to remove the "AND class_time > now()" so that timestamp can be passed in the select? That is, I'd like to be able to do this? select * from instructor_counts where class_time > now(); But class_time is not part of the VIEW so that's not valid. And if it was included then I don't have an aggregate function any more - no more grouping. 2) I think I'm missing something obvious. I know that I need to specify all my non-aggregate columns in the "GROUP BY", but I don't under stand why. Really, the results are just grouped only by person.id so why the need to specify the other columns. And if you don't specify all the columns then Postgresql reports: ERROR: column "person.id" must appear in the GROUP BY clause or be used in an aggregate function Is there a reason Postgresql doesn't just add the column automatically? It does in other cases (like a missing table in a join). Thanks -- Bill Moseley moseley@hank.org ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings
On Fri, Sep 16, 2005 at 12:06:19PM -0700, Roger Hand wrote: > > select * from instructor_counts where class_time > now(); > > > > But class_time is not part of the VIEW so that's not valid. > > No problem, just make it a part of the view. See the classes section below. > > CREATE VIEW future_instructor_counts > AS > SELECT * FROM > > (SELECT > person.id AS person_id, > first_name, > last_name) personinfo > > INNER JOIN > > -- Add class_time field! > (SELECT class.id, class_time FROM class > WHERE class_time > now() ) classes > > INNER JOIN > > (SELECT > id, count(class) AS class_count > FROM instructors GROUP BY id) classcount > > ON personinfo.person_id = instructors.id > AND classes.id = instructors.id I couldn't get that to work -- Postgresql isn't that helpful just reporting "ERROR: syntax error at or near ";" at character 496" even after adding a FROM in the first select. So, I'm stabbing in the dark to get it to work. > [Disclaimer: I've not tested this code at all. It could help if you sent table definitions and maybe even dummy > data via insert commands.] Ok -- this should be cut-n-paste: CREATE TABLE class ( id integer PRIMARY KEY, class_time timestamp(0) with time zone, name text ); CREATE TABLE person ( id integer PRIMARY KEY, first_name text ); create table instructors ( person integer NOT NULL REFERENCES person, class integer NOT NULL REFERENCES class, PRIMARY KEY (person, class) ); INSERT INTO person (id,first_name) values (1,'Joe'); INSERT INTO person (id,first_name) values (2,'Mary'); INSERT INTO person (id,first_name) values (3,'Bob'); INSERT INTO person (id,first_name) values (4,'Cindy'); INSERT INTO class (id,name, class_time) values (1,'Math', now()); INSERT INTO class (id,name, class_time) values (2,'Math', now() + interval '1 day'); INSERT INTO class (id,name, class_time) values (3,'Science', now()); INSERT INTO class (id,name, class_time) values (4,'PE', now() + interval '1 day'); INSERT INTO instructors (person, class) values (1,1); -- joe teaches math now INSERT INTO instructors (person, class) values (1,2); -- joe teaches math tomorrow INSERT INTO instructors (person, class) values (2,2); -- with Mary INSERT INTO instructors (person, class) values (3,3); -- Bob teaches science now INSERT INTO instructors (person, class) values (4,3); -- Cindy teaches science tomorrow -- view CREATE VIEW instructor_counts AS SELECT person.id AS person_id, first_name, count(instructors.class) AS class_count FROM class, instructors, person WHERE class.id = instructors.class AND person.id = instructors.person -- AND class_time > now() GROUP BY person_id, first_name; select * from instructor_counts order by class_count desc; -- Returns: person_id | first_name | class_count -----------+------------+------------- 1 | Joe | 2 2 | Mary | 1 3 | Bob | 1 4 | Cindy | 1 (4 rows) My GOAL above is to be able to add a WHERE class_time > $some_time. Here's were I left off, which I never could get to work. The individual selects work, but seems like I need to be say c.class_id = i.class in addition. But I can't even get this without syntax errors: CREATE VIEW instructor_counts AS SELECT * FROM (SELECT person.id AS person_id, first_name FROM person) p INNER JOIN (SELECT class.id AS class_id, class_time FROM class) c INNER JOIN (SELECT person, count(class) AS class_count FROM instructors GROUP BY person) i ON ( p.person_id = i.person); That also looks like the selects are going to be full table scans. -- Bill Moseley moseley@hank.org
>> > I have a view to generate a list of instructors and a count of their >> > future classes. >> > select * from instructor_counts where class_time > now(); >> > >> > But class_time is not part of the VIEW so that's not valid. >> >> No problem, just make it a part of the view. "No problem", I said ... famous last words. > select * from instructor_counts order by class_count desc; > > -- Returns: > > person_id | first_name | class_count > -----------+------------+------------- > 1 | Joe | 2 > 2 | Mary | 1 > 3 | Bob | 1 > 4 | Cindy | 1 > (4 rows) > > My GOAL above is to be able to add a WHERE class_time > $some_time. Armed with your table ddl and sample data I can see the problem more clearly. Unfortunately class_time cannot be a column in the view output. For example, look at the "Joe" line above ... if he teachestwo classes which "class_time" would it show? Since class_time can't be a column in the view output it can't be used in a WHERE clause. So it would appear to me that you won't able to meet your goal by simply using a view. However, there are other solutions you can use which may or may not be appropriate. I can think of three at the moment. #1: Function Solution: ================ To use functions you may first need to run this at command line: createlang plpgsql electric Then create a function that you can pass in a date to: CREATE FUNCTION getclasscount("timestamp") RETURNS "refcursor" AS ' DECLARE curs refcursor; BEGIN OPEN curs FOR SELECT * FROM (SELECT person, count(class) AS class_count FROM instructors INNER JOIN class ON class.id = instructors.person WHERE class.class_time > $1 GROUP BY person ) classcount INNER JOIN (SELECT person.id AS person_id, first_name FROM person ) personinfo ON personinfo.person_id = classcount.person RETURN curs; END; ' LANGUAGE 'plpgsql'; This would be the best solution if you are in control of the application source code. In Java, for example, it's relativelysimple to call this function and return the result as a result set. If you're working in Java I'd be glad to showyou same sample code. If you really do need a view for some reason, then this wouldn't work. #2: Simplify the Select Criteria Solution: ============================= A slightly less flexible approach, but one that may be workable, would be to add two boolean columns (with default valueof '0') to the class table: "completed" and "current". Then once a semester you run a simple query that updates them.Something like: UPDATE class SET current = '1' where class_time = '2005-09-01'; UPDATE class SET completed = '1' where class_time < '2005-09-01'; Then the view would be: CREATE VIEW vclasscount AS SELECT * FROM (SELECT person, count(class) AS class_count FROM instructors INNER JOIN (SELECT id FROM class WHERE class.completed = '0' AND class.current = '0') futureclasses ON futureclasses.id = instructors.class GROUP BY person ) classcount INNER JOIN (SELECT person.id AS person_id, first_name FROM person ) personinfo ON personinfo.person_id = classcount.person This would be a nice simple solution, but may not be flexible enough in that you can't specify an ad-hoc date or date-range... you would only see future class count. 3: Use a Temp Table =============== Again, if you have control of application logic, you could: 1) SELECT * INTO futureclasses FROM class where class_time > ? 2) Then make the view against futureclasses rather than classes. Good luck! -Roger > Bill Moseley
On Fri, Sep 16, 2005 at 04:56:25PM -0700, Roger Hand wrote: > So it would appear to me that you won't able to meet your goal by simply using a view. > > However, there are other solutions you can use which may or may not be appropriate. I can think of three at the moment. > > #1: Function Solution: I'll take a look at this. It's about time I started to learn about functions a bit. > This would be the best solution if you are in control of the > application source code. In Java, for example, it's relatively > simple to call this function and return the result as a result set. > If you're working in Java I'd be glad to show you same sample code. I'm using Perl and DBI (really Class::DBI but DBI is not far away). > #2: Simplify the Select Criteria Solution: > ============================= > > A slightly less flexible approach, but one that may be workable, would be to add two boolean columns (with default valueof '0') to the class table: "completed" and "current". Then once a semester you run a simple query that updates them.Something like: > > UPDATE class SET current = '1' where class_time = '2005-09-01'; > UPDATE class SET completed = '1' where class_time < '2005-09-01'; Classes start daily (and at different hours). I could cron once an hour I suppose, but I'd rather not de-normalize the data. Maybe I can just create three views (future, recent, old) and live with that. The temporary table is another possibility I'll look into. Thank you very much for spending time on this. I really appreciate it. -- Bill Moseley moseley@hank.org
Bill Moseley <moseley@hank.org> writes: > I couldn't get that to work -- Postgresql isn't that helpful just > reporting "ERROR: syntax error at or near ";" at character 496" even > after adding a FROM in the first select. So, I'm stabbing in the dark > to get it to work. [ not directly on topic, but... ] I think the answer to that is "use a newer version of Postgres". 8.0 and up provide fairly specific pointers for syntax errors. Silly example: regression=# select foo, bar regression-# , baz, regression-# zip zap zot regression-# blah blah; ERROR: syntax error at or near "zap" at character 28 LINE 3: zip zap zot ^ regression=# regards, tom lane