Re: Wrong rows selected with view - Mailing list pgsql-general

From Bill Moseley
Subject Re: Wrong rows selected with view
Date
Msg-id 20051117041257.GB29760@hank.org
Whole thread Raw
In response to Re: Wrong rows selected with view  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Wrong rows selected with view  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Wed, Nov 16, 2005 at 07:48:06PM -0500, Tom Lane wrote:
> Bill Moseley <moseley@hank.org> writes:
> > [ strange behavior ]
>
> Oh, duh, it's not a PG bug: the problem is that the view is
> underspecified.  You have
>
> SELECT DISTINCT ON (class.id)
>     ... a bunch of stuff ...
>     FROM ... a bunch of tables ...
>     ORDER BY class.id;
>
> The difficulty with this is that DISTINCT ON will take the first row in
> each group with the same class.id.  And since you're only sorting by
> class.id, "the first row" is ill-defined.

Sorry, but I fear I'm missing something.


That ORDER BY is added by PG -- it's not part of my view when I define
it.  I assume PG adds that so it can do the DISTINCT ON.

Still, I don't have any duplicate class.id rows in this select that I
can see.  class.id 1243 and 1244 are not the same, yet PG is
(sometimes) throwing out one of them.  Are you saying that somehow PG
thinks they are the same class.id and is thus removing one?


I'm asking for a list of all classes taught by instructor 84.


ws2=> select * from instructors where person = 84 order by class;
 person | class
--------+-------
     84 |   727
     84 |   739
     84 |   804
     84 |   813
     84 |   867
     84 |  1243
     84 |  1244
(7 rows)


The reason I'm using DISTINCT ON is because the "class_list" view is
suppose to just return a list of unique classes, and a class might have
more than one instructor which would result in extra rows -- as shown
here:

ws2=> select * from instructors where class in (select class from instructors where person = 84);
 person | class
--------+-------
     84 |   727
     84 |   739
     84 |   804
     84 |   813
     84 |   867
     84 |  1243
     26 |  1243
     84 |  1244
     26 |  1244
(9 rows)

So when I don't want duplicates:

ws2=> select distinct on (class) * from instructors where class in (select class from instructors where person = 84);
 person | class
--------+-------
     84 |   727
     84 |   739
     84 |   804
     84 |   813
     84 |   867
     84 |  1243
     84 |  1244
(7 rows)

I don't care in this case about the DISTINCT ON throwing out the
duplicates -- I just care about distinct classes, not that all the
instructors are included in this select.




And even if I throw in all my other joins I get the same thing:

ws2=>        SELECT DISTINCT ON (class.id)
ws2->             class.id AS id
ws2->
ws2->           FROM class, location, region, person, instructors
ws2->
ws2->          WHERE class.location           = location.id   -- join with location
ws2->                AND class.id             = instructors.class   -- join the instructors
ws2->                AND instructors.person   = person.id     -- join the person(s)
ws2->                AND location.region      = region.id   -- join the location to a region
ws2->                AND person.id = 84;
  id
------
  727
  739
  804
  813
  867
 1243
 1244
(7 rows)


> I'm not sure why qsort's
> behavior seems to depend on the width of the rows, but there's no doubt
> that it's sorting different rows to the front of each group depending
> on which view you use.

I just don't see what groups there are, though in this case.


> It could also be that you don't want to be using DISTINCT ON at all;
> have you thought through exactly what this view ought to produce for
> each class.id?

Yes, I think so.  A list of columns related to it, with the exception
of when there's duplicate instructors I want one of those duplicates
thrown out (and I don't care which one).

When I do a query that generates duplicate class.id's such as
when a class has more than one instructor:

ws2=>         select  class.id AS class_id,
ws2->                 person.id AS person_id
ws2->
ws2->           FROM class, location, region, person, instructors
ws2->
ws2->          WHERE class.location           = location.id   -- join with location
ws2->                AND class.id             = instructors.class   -- join the instructors
ws2->                AND instructors.person   = person.id     -- join the person(s)
ws2->                AND location.region      = region.id   -- join the location to a region
ws2->                AND class_time > now();
 class_id | person_id
----------+-----------
      561 |        95
      614 |        95
      747 |       111
      762 |       111
      772 |       111
      883 |        13
      924 |        26
      935 |        26
      945 |        26
     1243 |        84
     1243 |        26
     1244 |        84
     1244 |        26
(13 rows)


You can see some classes are listed twice, so using distinct on gets
just my list of unique classes:


ws2=>         SELECT DISTINCT ON (class.id)
ws2->                 class.id AS class_id,
ws2->                 person.id AS person_id
ws2->
ws2->           FROM class, location, region, person, instructors
ws2->
ws2->          WHERE class.location           = location.id   -- join with location
ws2->                AND class.id             = instructors.class   -- join the instructors
ws2->                AND instructors.person   = person.id     -- join the person(s)
ws2->                AND location.region      = region.id   -- join the location to a region
ws2->                AND class_time > now();
 class_id | person_id
----------+-----------
      561 |        95
      614 |        95
      747 |       111
      762 |       111
      772 |       111
      883 |        13
      924 |        26
      935 |        26
      945 |        26
     1243 |        84
     1244 |        84
(11 rows)


All the view does is include more columns in the result set, and that
seems to change the number of results.  Here's the view doing the
previous query based on "class_time":

ws2=> select id, person_id from class_list where class_time > now();
  id  | person_id
------+-----------
  561 |        95
  614 |        95
  747 |       111
  762 |       111
  772 |       111
  883 |        13
  924 |        26
  935 |        26
  945 |        26
 1243 |        26 << PG selected the other instructor this time
 1244 |        84
(11 rows)




FYI - For those following along, none of those queries show the actual
problem I'm having with the view.  Which is the view of the above join
is returning the wrong number of rows -- 1243 is missing.

Again, selecting by person_id:

ws2=> select id from class_list where person_id = 84;
  id
------
  727
  739
  804
  813
  867
 1244
(6 rows)


I have a feeling this is going to be a "doh!" when I finally see it...



--
Bill Moseley
moseley@hank.org


pgsql-general by date:

Previous
From: Michael Glaesemann
Date:
Subject: Re: Does PG Support Unicode on Windows?
Next
From: Justin Hawkins
Date:
Subject: Re: Trouble with recursive trigger