Re: Wrong rows selected with view - Mailing list pgsql-general

From Bill Moseley
Subject Re: Wrong rows selected with view
Date
Msg-id 20051117140653.GA11663@hank.org
Whole thread Raw
In response to Re: Wrong rows selected with view  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Wed, Nov 16, 2005 at 11:34:21PM -0500, Tom Lane wrote:
> No, I'm saying that the underlying data (the join result before applying
> DISTINCT ON) looks like this:
>
> bill=# select  "class".id, person.id AS person_id
> bill-#    FROM "class", "location", region, person, instructors
> bill-#   WHERE "class"."location" = "location".id AND "class".id = instructors."class"
> bill-# AND instructors.person = person.id AND "location".region = region.id
> bill-#   ORDER BY "class".id;
>   id  | person_id
> ------+-----------
>     1 |        49
>     2 |        27
>     3 |        19
>     4 |        82
>     5 |        12
>  ...
>  1238 |        61
>  1238 |        60
>  1239 |        40
>  1240 |        67
>  1241 |        11
>  1243 |        26
>  1243 |        84
>  1244 |        26
>  1244 |        84
> (1311 rows)
>
> The DISTINCT ON will take just one of the two rows with id = 1243, and
> just one of the rows with id = 1244, and *it is effectively random which
> one gets picked*.  So when you then select rows with person_id = 84, you
> may or may not see these rows in the end result.

Yikes!  The problem is *when* DISTINCT ON happens, right?

And, Tom,  you actually explained this to me on the list back on Aug 25th,
but that's when I was using the view in a different way.  You noted
that the order was "unpredictable" but at that time it didn't matter
which row was selected to me.

    http://archives.postgresql.org/pgsql-general/2005-08/msg01291.php

This current problem was due to my assumption of how PG executes the
query:

My assumption was that the select would first do the joins (including
limit by class.id = 84) *then* weed out the duplicate class.ids.

But if PG is first doing the the joins on all the tables (before
limiting by class.id = 84) and then weeding out the duplicate
class.ids, and then finally limiting by class.id = 84 then I can see
where I might end up wit the missing row.

Frankly, I expected the first to happen because it would use an index
to select just the records of class.id = 84, then do the joins on
that small set of records.  Didn't seem likely that the database would
join all the records first and then limit by class.id.  Seems like the
hard way to do the query.  But the query planner works in strange and
mysterious ways. ;)


Does that also explain why PG was sometimes returning the "correct"
number of rows?  Depending on which of the two query plans above
were used?

> Exactly.  So your view is going to return the class id along with a
> randomly selected one of the instructor ids.  It seems to me that
> filtering this result on instructor id is perhaps a bit ill-advised,
> even if you fix the view so that the chosen instructor id isn't so
> random (eg, you could fix it to display the lowest-numbered instructor
> id for the particular class).  Even then, are you searching for the
> instructor id that the view happens to show for that class, or some
> other one?

Well, clearly, my "one-size-fits-all view" doesn't work in this case.
I just need another view without distinct when limiting by instructor.

It was that red-herring of removing a seemingly random column from the
view that made it hard to see what was really happening.

Thanks very much for all your time.


--
Bill Moseley
moseley@hank.org


pgsql-general by date:

Previous
From: "Magnus Hagander"
Date:
Subject: Re: Rebranding PostgreSQL
Next
From: Lincoln Yeoh
Date:
Subject: Re: PREPARE TRANSACTION and webapps