Re: Wrong rows selected with view - Mailing list pgsql-general
From | Bill Moseley |
---|---|
Subject | Re: Wrong rows selected with view |
Date | |
Msg-id | 20051117140653.GA11663@hank.org Whole thread Raw |
In response to | Re: Wrong rows selected with view (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-general |
On Wed, Nov 16, 2005 at 11:34:21PM -0500, Tom Lane wrote: > No, I'm saying that the underlying data (the join result before applying > DISTINCT ON) looks like this: > > bill=# select "class".id, person.id AS person_id > bill-# FROM "class", "location", region, person, instructors > bill-# WHERE "class"."location" = "location".id AND "class".id = instructors."class" > bill-# AND instructors.person = person.id AND "location".region = region.id > bill-# ORDER BY "class".id; > id | person_id > ------+----------- > 1 | 49 > 2 | 27 > 3 | 19 > 4 | 82 > 5 | 12 > ... > 1238 | 61 > 1238 | 60 > 1239 | 40 > 1240 | 67 > 1241 | 11 > 1243 | 26 > 1243 | 84 > 1244 | 26 > 1244 | 84 > (1311 rows) > > The DISTINCT ON will take just one of the two rows with id = 1243, and > just one of the rows with id = 1244, and *it is effectively random which > one gets picked*. So when you then select rows with person_id = 84, you > may or may not see these rows in the end result. Yikes! The problem is *when* DISTINCT ON happens, right? And, Tom, you actually explained this to me on the list back on Aug 25th, but that's when I was using the view in a different way. You noted that the order was "unpredictable" but at that time it didn't matter which row was selected to me. http://archives.postgresql.org/pgsql-general/2005-08/msg01291.php This current problem was due to my assumption of how PG executes the query: My assumption was that the select would first do the joins (including limit by class.id = 84) *then* weed out the duplicate class.ids. But if PG is first doing the the joins on all the tables (before limiting by class.id = 84) and then weeding out the duplicate class.ids, and then finally limiting by class.id = 84 then I can see where I might end up wit the missing row. Frankly, I expected the first to happen because it would use an index to select just the records of class.id = 84, then do the joins on that small set of records. Didn't seem likely that the database would join all the records first and then limit by class.id. Seems like the hard way to do the query. But the query planner works in strange and mysterious ways. ;) Does that also explain why PG was sometimes returning the "correct" number of rows? Depending on which of the two query plans above were used? > Exactly. So your view is going to return the class id along with a > randomly selected one of the instructor ids. It seems to me that > filtering this result on instructor id is perhaps a bit ill-advised, > even if you fix the view so that the chosen instructor id isn't so > random (eg, you could fix it to display the lowest-numbered instructor > id for the particular class). Even then, are you searching for the > instructor id that the view happens to show for that class, or some > other one? Well, clearly, my "one-size-fits-all view" doesn't work in this case. I just need another view without distinct when limiting by instructor. It was that red-herring of removing a seemingly random column from the view that made it hard to see what was really happening. Thanks very much for all your time. -- Bill Moseley moseley@hank.org
pgsql-general by date: