Thread: Help with a subselect inside a view
I need a little SQL help: I'm trying to get a subselect working inside a view. I have a table "class" that has related tables (a class has a location, a location has an address with columns city, state, zip). I want to use a VIEW to display columns related to a given class. But a class can also have one or more instructors. So I have a link table: Table "public.instructors" Column | Type | Modifiers --------+---------+----------- person | integer | not null class | integer | not null Foreign-key constraints: "$1" FOREIGN KEY (person) REFERENCES person(id) "$2" FOREIGN KEY ("class") REFERENCES "class"(id) I can do the following, but in the (very rare) case where there may be two instructors assigned to the class I will get two rows back. CREATE VIEW class_list ( id, name, class_time, location, location_name, address, city, state, zip, instructor_name ) AS SELECT class.id, class.name, class.class_time, class.location, location.name, address.id, address.city, address.state, address.zip, person.last_name FROM class, location, address, instructors, person WHERE class.location = location.id AND location.address = address.id AND location.region = region.id -- Not what I want AND instructors.person = person.id AND instructors.class = class.id; I'm completely happy to just fetch just one of the instructors, and don't care which one. I just need only one row per class. (I assume that's my hint right there.) I can select a single instructor from a given class like: SELECT person.id FROM instructors, person WHERE instructors.class = 555 AND person.id = instructors.person LIMIT 1; So I thought I might be able to add that as a subselect to the VIEW, but I have not been able to make it work. I suspect I'm missing something obvious. Thanks, Oh BTW -- If I do a count(*) and a WHERE that only includes columns in the "class" table on the VIEW, will Postgresql still do the joins? Or will it only do the select on the "class" table. I suspect it will do the joins to make sure the relations can be found. -- Bill Moseley moseley@hank.org
On Wed, Aug 24, 2005 at 11:12:17PM -0700, Bill Moseley wrote: > I need a little SQL help: > > I'm trying to get a subselect working inside a view. > > I have a table "class" that has related tables (a class has a > location, a location has an address with columns city, state, zip). > I want to use a VIEW to display columns related to a given class. > > But a class can also have one or more instructors. So I have a link > table: > > Table "public.instructors" > Column | Type | Modifiers > --------+---------+----------- > person | integer | not null > class | integer | not null > > Foreign-key constraints: > "$1" FOREIGN KEY (person) REFERENCES person(id) > "$2" FOREIGN KEY ("class") REFERENCES "class"(id) > > I can do the following, but in the (very rare) case where there may be > two instructors assigned to the class I will get two rows back. > > CREATE VIEW class_list > ( > id, name, class_time, location, location_name, > address, city, state, zip, > instructor_name > ) > AS > SELECT class.id, class.name, class.class_time, class.location, > location.name, > address.id, address.city, address.state, address.zip, > person.last_name > > FROM class, location, address, > instructors, person > > WHERE class.location = location.id > AND location.address = address.id > AND location.region = region.id > -- Not what I want > AND instructors.person = person.id > AND instructors.class = class.id; > > I'm completely happy to just fetch just one of the instructors, and > don't care which one. I just need only one row per class. (I assume > that's my hint right there.) This sounds like a case for PostgreSQL's nifty DISTINCT ON functionality. http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT HTH :) Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
On Wed, Aug 24, 2005 at 23:12:17 -0700, Bill Moseley <moseley@hank.org> wrote: > I need a little SQL help: > > I'm trying to get a subselect working inside a view. Unfortunately you didn't show us what you tried. My guess would be that you didn't enclose the subselect in parenthesis. The distinct on solution that was suggested is probably a better way to go anyway.
Hi David, On Thu, Aug 25, 2005 at 01:22:02AM -0700, David Fetter wrote: > This sounds like a case for PostgreSQL's nifty DISTINCT ON functionality. > > http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group. I read that and thought it wasn't a drop-in replacement for my code due to the leftmost ORDER BY requirement. But, it seems to work even if that requirement is not met. Perhaps I not understanding the wording above? Or is Postgresql adding in the order automatically? My original VIEWS with duplicates: DROP VIEW cl; CREATE VIEW cl (id, class_time, instructor) AS SELECT class.id, class.class_time, person.first_name FROM class, instructors, person WHERE instructors.person = person.id AND class.id = instructors.class; select * from cl where id = 555; id | class_time | instructor -----+------------------------+------------ 555 | 2005-09-30 09:00:00-07 | Cheryl 555 | 2005-09-30 09:00:00-07 | Bob (2 rows) And with DISTINCT ON(): DROP VIEW cl; CREATE VIEW cl (id, class_time, instructor) AS SELECT DISTINCT ON(class.id) class.id, class.class_time, person.first_name FROM class, instructors, person WHERE instructors.person = person.id AND class.id = instructors.class; select * from cl where id = 555; id | class_time | instructor -----+------------------------+------------ 555 | 2005-09-30 09:00:00-07 | Cheryl (1 row) Here where the leftmost ORDER BY doesn't match the DISTINCT ON, which I thought was not possible: select * from cl where class_time > now() order by instructor limit 3; id | class_time | instructor -----+------------------------+------------ 544 | 2005-08-31 09:00:00-07 | Cheryl 555 | 2005-09-30 09:00:00-07 | Cheryl 737 | 2005-08-30 09:00:00-07 | Cynthia -- Bill Moseley moseley@hank.org
On Thu, Aug 25, 2005 at 08:05:36AM -0500, Bruno Wolff III wrote: > On Wed, Aug 24, 2005 at 23:12:17 -0700, > Bill Moseley <moseley@hank.org> wrote: > > I need a little SQL help: > > > > I'm trying to get a subselect working inside a view. > > Unfortunately you didn't show us what you tried. My guess would be that > you didn't enclose the subselect in parenthesis. No, it wasn't that. I just didn't want to look too foolish. ;) DROP VIEW cl; CREATE VIEW cl (id, instructor) AS SELECT class.id, person.first_name FROM class, instructors, person WHERE instructors.person = person.id AND class.id = ( SELECT instructors.id FROM instructors, person WHERE instructors.class = class.id AND person.id = instructors.person LIMIT 1 ); Which returns a row for every row in "instructors" table. > The distinct on solution that was suggested is probably a better way to > go anyway. Turns out it is, Thanks. -- Bill Moseley moseley@hank.org
And about being efficient: On Thu, Aug 25, 2005 at 08:01:26AM -0700, Bill Moseley wrote: > DROP VIEW cl; > CREATE VIEW cl (id, class_time, instructor) > AS > SELECT DISTINCT ON(class.id) > class.id, class.class_time, person.first_name > FROM class, instructors, person > WHERE instructors.person = person.id > AND class.id = instructors.class; And in a case like above, I'm displaying the list a page at a time. So I first do a count to find total rows and then a select: select count(*) from cl where class_time >= now(); select * from cl where class_time >= now() LIMIT 20 OFFSET 40; I looked at the EXPLAIN ANALYZE for both and both do the join, it seems. I guess it has to be that way. So would it be smart to do the initial count on "class" instead of the view first? select count(*) from class where class_time >= now(); select * from cl where class_time >= now() LIMIT 20 OFFSET 40; That is, Postgresql won't figure out that it only need to look at one table, right? -- Bill Moseley moseley@hank.org
Bill Moseley <moseley@hank.org> writes: >> http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT > The DISTINCT ON expression(s) must match the leftmost ORDER BY > expression(s). The ORDER BY clause will normally contain additional > expression(s) that determine the desired precedence of rows within > each DISTINCT ON group. > I read that and thought it wasn't a drop-in replacement for my code > due to the leftmost ORDER BY requirement. But, it seems to work even > if that requirement is not met. > CREATE VIEW cl (id, class_time, instructor) > AS > SELECT DISTINCT ON(class.id) > class.id, class.class_time, person.first_name > FROM class, instructors, person > WHERE instructors.person = person.id > AND class.id = instructors.class; This is allowed because the code automatically adds "ORDER BY class.id" within the view (as you would see if you examined the view with \d). It's fairly pointless though, because as the manual notes, you can't get any well-defined behavior without additional ORDER BY columns to prioritize the rows within class.id groups. As is, you're getting random choices of class_time and first_name within the groups. (Though maybe in this application, you don't care.) regards, tom lane
On Thu, Aug 25, 2005 at 12:14:31PM -0400, Tom Lane wrote: > > CREATE VIEW cl (id, class_time, instructor) > > AS > > SELECT DISTINCT ON(class.id) > > class.id, class.class_time, person.first_name > > FROM class, instructors, person > > WHERE instructors.person = person.id > > AND class.id = instructors.class; > > This is allowed because the code automatically adds "ORDER BY class.id" > within the view (as you would see if you examined the view with \d). I see that now. Might be helpful for the docs to say that for folks like me. > It's fairly pointless though, because as the manual notes, you can't get > any well-defined behavior without additional ORDER BY columns to > prioritize the rows within class.id groups. As is, you're getting > random choices of class_time and first_name within the groups. > (Though maybe in this application, you don't care.) I'm not sure I follow what you are saying. I understand that I have no control over which "first_name" I end up with (and I don't really care), but class_time is a column in the "class" table which I'm using DISTINCT ON on, so that should be unique as well. So I assume you meant random choice of first_name, not class_time. Thanks, -- Bill Moseley moseley@hank.org
Bill Moseley <moseley@hank.org> writes: > On Thu, Aug 25, 2005 at 12:14:31PM -0400, Tom Lane wrote: >> It's fairly pointless though, because as the manual notes, you can't get >> any well-defined behavior without additional ORDER BY columns to >> prioritize the rows within class.id groups. As is, you're getting >> random choices of class_time and first_name within the groups. >> (Though maybe in this application, you don't care.) > I'm not sure I follow what you are saying. I understand that I have > no control over which "first_name" I end up with (and I don't really > care), but class_time is a column in the "class" table which I'm using > DISTINCT ON on, so that should be unique as well. So I assume you > meant random choice of first_name, not class_time. Sorry, I meant that the query by itself doesn't guarantee anything about which values you will get. If you know a-priori that there is only one value of class_time per class id, then of course you don't care which row it's selected from. But from the point of view of this query, you're getting an unspecified one of the possible values. In most of the applications I've seen for DISTINCT ON, people *do* care. For instance, if you wanted to further constrain what you were getting, you might wish that the returned first_name were the alphabetically first among the class's instructors. You could get that with SELECT DISTINCT ON(class.id) ... ORDER BY class.id, person.first_name; regards, tom lane
On Thu, Aug 25, 2005 at 08:19:25 -0700, Bill Moseley <moseley@hank.org> wrote: > > DROP VIEW cl; > CREATE VIEW cl (id, instructor) > AS > SELECT class.id, person.first_name > FROM class, instructors, person > WHERE instructors.person = person.id > AND class.id = ( > SELECT instructors.id > FROM instructors, person > WHERE instructors.class = class.id > AND person.id = instructors.person > LIMIT 1 > ); > > Which returns a row for every row in "instructors" table. I think if you were to use this approach you would do something more like: DROP VIEW cl; CREATE VIEW cl (id, instructor) AS SELECT class.id, (SELECT person.first_name FROM instructors, person WHERE instructors.class = class.id AND person.id = instructors.person LIMIT 1) FROM class;