Thread: Help with a subselect inside a view

Help with a subselect inside a view

From
Bill Moseley
Date:
I need a little SQL help:

I'm trying to get a subselect working inside a view.

I have a table "class" that has related tables (a class has a
location, a location has an address with columns city, state, zip).
I want to use a VIEW to display columns related to a given class.

But a class can also have one or more instructors.  So I have a link
table:

  Table "public.instructors"
 Column |  Type   | Modifiers
--------+---------+-----------
 person | integer | not null
 class  | integer | not null

Foreign-key constraints:
    "$1" FOREIGN KEY (person) REFERENCES person(id)
    "$2" FOREIGN KEY ("class") REFERENCES "class"(id)

I can do the following, but in the (very rare) case where there may be
two instructors assigned to the class I will get two rows back.

CREATE VIEW class_list
        (
            id, name, class_time, location, location_name,
            address, city, state, zip,
            instructor_name
        )
    AS
        SELECT class.id, class.name, class.class_time, class.location,
                 location.name,
               address.id, address.city, address.state, address.zip,
               person.last_name

          FROM class, location, address,
               instructors, person

         WHERE class.location           = location.id
               AND location.address     = address.id
               AND location.region      = region.id
               -- Not what I want
               AND instructors.person   = person.id
               AND instructors.class    = class.id;

I'm completely happy to just fetch just one of the instructors, and
don't care which one.  I just need only one row per class.  (I assume
that's my hint right there.)

I can select a single instructor from a given class like:

    SELECT  person.id
      FROM  instructors, person
     WHERE  instructors.class = 555
            AND person.id = instructors.person
     LIMIT  1;

So I thought I might be able to add that as a subselect to the VIEW,
but I have not been able to make it work.  I suspect I'm missing
something obvious.

Thanks,

Oh BTW -- If I do a count(*) and a WHERE that only includes columns in
the "class" table on the VIEW, will Postgresql still do the joins?  Or
will it only do the select on the "class" table.  I suspect it will
do the joins to make sure the relations can be found.


--
Bill Moseley
moseley@hank.org

Re: Help with a subselect inside a view

From
David Fetter
Date:
On Wed, Aug 24, 2005 at 11:12:17PM -0700, Bill Moseley wrote:
> I need a little SQL help:
>
> I'm trying to get a subselect working inside a view.
>
> I have a table "class" that has related tables (a class has a
> location, a location has an address with columns city, state, zip).
> I want to use a VIEW to display columns related to a given class.
>
> But a class can also have one or more instructors.  So I have a link
> table:
>
>   Table "public.instructors"
>  Column |  Type   | Modifiers
> --------+---------+-----------
>  person | integer | not null
>  class  | integer | not null
>
> Foreign-key constraints:
>     "$1" FOREIGN KEY (person) REFERENCES person(id)
>     "$2" FOREIGN KEY ("class") REFERENCES "class"(id)
>
> I can do the following, but in the (very rare) case where there may be
> two instructors assigned to the class I will get two rows back.
>
> CREATE VIEW class_list
>         (
>             id, name, class_time, location, location_name,
>             address, city, state, zip,
>             instructor_name
>         )
>     AS
>         SELECT class.id, class.name, class.class_time, class.location,
>                  location.name,
>                address.id, address.city, address.state, address.zip,
>                person.last_name
>
>           FROM class, location, address,
>                instructors, person
>
>          WHERE class.location           = location.id
>                AND location.address     = address.id
>                AND location.region      = region.id
>                -- Not what I want
>                AND instructors.person   = person.id
>                AND instructors.class    = class.id;
>
> I'm completely happy to just fetch just one of the instructors, and
> don't care which one.  I just need only one row per class.  (I assume
> that's my hint right there.)

This sounds like a case for PostgreSQL's nifty DISTINCT ON functionality.

http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT

HTH :)

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!

Re: Help with a subselect inside a view

From
Bruno Wolff III
Date:
On Wed, Aug 24, 2005 at 23:12:17 -0700,
  Bill Moseley <moseley@hank.org> wrote:
> I need a little SQL help:
>
> I'm trying to get a subselect working inside a view.

Unfortunately you didn't show us what you tried. My guess would be that
you didn't enclose the subselect in parenthesis.

The distinct on solution that was suggested is probably a better way to
go anyway.

Re: Help with a subselect inside a view

From
Bill Moseley
Date:
Hi David,

On Thu, Aug 25, 2005 at 01:22:02AM -0700, David Fetter wrote:
> This sounds like a case for PostgreSQL's nifty DISTINCT ON functionality.
>
> http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT

   The DISTINCT ON expression(s) must match the leftmost ORDER BY
   expression(s). The ORDER BY clause will normally contain additional
   expression(s) that determine the desired precedence of rows within
   each DISTINCT ON group.

I read that and thought it wasn't a drop-in replacement for my code
due to the leftmost ORDER BY requirement.  But, it seems to work even
if that requirement is not met.

Perhaps I not understanding the wording above?  Or is Postgresql
adding in the order automatically?

My original VIEWS with duplicates:

    DROP VIEW cl;
    CREATE VIEW cl  (id, class_time, instructor)
        AS
            SELECT class.id, class.class_time, person.first_name
              FROM class, instructors, person
             WHERE instructors.person = person.id
               AND class.id = instructors.class;


     select * from cl where id = 555;
     id  |       class_time       | instructor
    -----+------------------------+------------
     555 | 2005-09-30 09:00:00-07 | Cheryl
     555 | 2005-09-30 09:00:00-07 | Bob
    (2 rows)


And with DISTINCT ON():

    DROP VIEW cl;
    CREATE VIEW cl  (id, class_time, instructor)
        AS
            SELECT DISTINCT ON(class.id)
                   class.id, class.class_time, person.first_name
              FROM class, instructors, person
             WHERE instructors.person = person.id
               AND class.id = instructors.class;



     select * from cl where id = 555;
     id  |       class_time       | instructor
    -----+------------------------+------------
     555 | 2005-09-30 09:00:00-07 | Cheryl
    (1 row)


Here where the leftmost ORDER BY doesn't match the DISTINCT ON, which I thought
was not possible:


    select * from cl where class_time > now() order by instructor limit 3;
     id  |       class_time       | instructor
    -----+------------------------+------------
     544 | 2005-08-31 09:00:00-07 | Cheryl
     555 | 2005-09-30 09:00:00-07 | Cheryl
     737 | 2005-08-30 09:00:00-07 | Cynthia


--
Bill Moseley
moseley@hank.org


Re: Help with a subselect inside a view

From
Bill Moseley
Date:
On Thu, Aug 25, 2005 at 08:05:36AM -0500, Bruno Wolff III wrote:
> On Wed, Aug 24, 2005 at 23:12:17 -0700,
>   Bill Moseley <moseley@hank.org> wrote:
> > I need a little SQL help:
> >
> > I'm trying to get a subselect working inside a view.
>
> Unfortunately you didn't show us what you tried. My guess would be that
> you didn't enclose the subselect in parenthesis.

No, it wasn't that.  I just didn't want to look too foolish. ;)

DROP VIEW cl;
CREATE VIEW cl  (id, instructor)
    AS
        SELECT class.id, person.first_name
          FROM class, instructors, person
         WHERE instructors.person = person.id
           AND class.id = (
                   SELECT instructors.id
                     FROM instructors, person
                    WHERE instructors.class = class.id
                      AND person.id = instructors.person
                    LIMIT 1
                );

Which returns a row for every row in "instructors" table.


> The distinct on solution that was suggested is probably a better way to
> go anyway.

Turns out it is, Thanks.


--
Bill Moseley
moseley@hank.org


Re: Help with a subselect inside a view

From
Bill Moseley
Date:
And about being efficient:

On Thu, Aug 25, 2005 at 08:01:26AM -0700, Bill Moseley wrote:
>     DROP VIEW cl;
>     CREATE VIEW cl  (id, class_time, instructor)
>         AS
>             SELECT DISTINCT ON(class.id)
>                    class.id, class.class_time, person.first_name
>               FROM class, instructors, person
>              WHERE instructors.person = person.id
>                AND class.id = instructors.class;

And in a case like above, I'm displaying the list a page at a time.
So I first do a count to find total rows and then a select:

    select count(*) from cl where class_time >= now();
    select * from cl where class_time >= now() LIMIT 20 OFFSET 40;

I looked at the EXPLAIN ANALYZE for both and both do the join, it
seems.  I guess it has to be that way.

So would it be smart to do the initial count on "class" instead
of the view first?

    select count(*) from class where class_time >= now();
    select * from cl where class_time >= now() LIMIT 20 OFFSET 40;

That is, Postgresql won't figure out that it only need to look at one
table, right?

--
Bill Moseley
moseley@hank.org


Re: Help with a subselect inside a view

From
Tom Lane
Date:
Bill Moseley <moseley@hank.org> writes:
>> http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT

>    The DISTINCT ON expression(s) must match the leftmost ORDER BY
>    expression(s). The ORDER BY clause will normally contain additional
>    expression(s) that determine the desired precedence of rows within
>    each DISTINCT ON group.

> I read that and thought it wasn't a drop-in replacement for my code
> due to the leftmost ORDER BY requirement.  But, it seems to work even
> if that requirement is not met.

>     CREATE VIEW cl  (id, class_time, instructor)
>         AS
>             SELECT DISTINCT ON(class.id)
>                    class.id, class.class_time, person.first_name
>               FROM class, instructors, person
>              WHERE instructors.person = person.id
>                AND class.id = instructors.class;

This is allowed because the code automatically adds "ORDER BY class.id"
within the view (as you would see if you examined the view with \d).
It's fairly pointless though, because as the manual notes, you can't get
any well-defined behavior without additional ORDER BY columns to
prioritize the rows within class.id groups.  As is, you're getting
random choices of class_time and first_name within the groups.
(Though maybe in this application, you don't care.)

            regards, tom lane

Re: Help with a subselect inside a view

From
Bill Moseley
Date:
On Thu, Aug 25, 2005 at 12:14:31PM -0400, Tom Lane wrote:
> >     CREATE VIEW cl  (id, class_time, instructor)
> >         AS
> >             SELECT DISTINCT ON(class.id)
> >                    class.id, class.class_time, person.first_name
> >               FROM class, instructors, person
> >              WHERE instructors.person = person.id
> >                AND class.id = instructors.class;
>
> This is allowed because the code automatically adds "ORDER BY class.id"
> within the view (as you would see if you examined the view with \d).

I see that now.  Might be helpful for the docs to say that for folks
like me.


> It's fairly pointless though, because as the manual notes, you can't get
> any well-defined behavior without additional ORDER BY columns to
> prioritize the rows within class.id groups.  As is, you're getting
> random choices of class_time and first_name within the groups.
> (Though maybe in this application, you don't care.)

I'm not sure I follow what you are saying.  I understand that I have
no control over which "first_name" I end up with (and I don't really
care), but class_time is a column in the "class" table which I'm using
DISTINCT ON on, so that should be unique as well.  So I assume you
meant random choice of first_name, not class_time.

Thanks,




--
Bill Moseley
moseley@hank.org


Re: Help with a subselect inside a view

From
Tom Lane
Date:
Bill Moseley <moseley@hank.org> writes:
> On Thu, Aug 25, 2005 at 12:14:31PM -0400, Tom Lane wrote:
>> It's fairly pointless though, because as the manual notes, you can't get
>> any well-defined behavior without additional ORDER BY columns to
>> prioritize the rows within class.id groups.  As is, you're getting
>> random choices of class_time and first_name within the groups.
>> (Though maybe in this application, you don't care.)

> I'm not sure I follow what you are saying.  I understand that I have
> no control over which "first_name" I end up with (and I don't really
> care), but class_time is a column in the "class" table which I'm using
> DISTINCT ON on, so that should be unique as well.  So I assume you
> meant random choice of first_name, not class_time.

Sorry, I meant that the query by itself doesn't guarantee anything about
which values you will get.  If you know a-priori that there is only one
value of class_time per class id, then of course you don't care which
row it's selected from.  But from the point of view of this query,
you're getting an unspecified one of the possible values.

In most of the applications I've seen for DISTINCT ON, people *do* care.
For instance, if you wanted to further constrain what you were getting,
you might wish that the returned first_name were the alphabetically
first among the class's instructors.  You could get that with
    SELECT DISTINCT ON(class.id)
    ...
    ORDER BY class.id, person.first_name;

            regards, tom lane

Re: Help with a subselect inside a view

From
Bruno Wolff III
Date:
On Thu, Aug 25, 2005 at 08:19:25 -0700,
  Bill Moseley <moseley@hank.org> wrote:
>
> DROP VIEW cl;
> CREATE VIEW cl  (id, instructor)
>     AS
>         SELECT class.id, person.first_name
>           FROM class, instructors, person
>          WHERE instructors.person = person.id
>            AND class.id = (
>                    SELECT instructors.id
>                      FROM instructors, person
>                     WHERE instructors.class = class.id
>                       AND person.id = instructors.person
>                     LIMIT 1
>                 );
>
> Which returns a row for every row in "instructors" table.

I think if you were to use this approach you would do something more like:

DROP VIEW cl;
CREATE VIEW cl  (id, instructor)
    AS
        SELECT class.id,
            (SELECT person.first_name
               FROM instructors, person
               WHERE instructors.class = class.id
                 AND person.id = instructors.person
               LIMIT 1)
          FROM class;