Thread: Wrong rows selected with view
PostgreSQL 7.4.8 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.0.2 20050816 (prerelease) (Debian 4.0.1-5) Hopefully this is something simple -- I assume it's a problem with my SQL. But it looks really weird to me at this late hour. I have some tables for managing workshops, and I have a view (below). I noticed for a given select it was returning the wrong number of rows. I expected seven rows back, but only got six. I rewrote the view with just the joins and it returned the correct number of rows. So I started the brute force method of removing one thing at a time in the view to see what would make it start returning the correct number of rows. That just confused me more. Below if I comment out *any* single column that's marked "-- this" then I get the correct number of rows. If I comment out any rows marked "-- nope" then there's no change (get six rows returned). But, if I start commenting out more than one "-- nope" then I get seven rows. Can someone point out my error? create view bar AS SELECT DISTINCT ON (class.id) -- this class.id AS id, class.name AS name, -- this class.class_time AS class_time, -- this class.begin_reg_time AS begin_reg_time, -- this (CASE WHEN class.register_cutoff_time > class.class_time -- this THEN class.register_cutoff_time ELSE class.class_time END) AS register_stop_time, class.location AS location, -- nope class.workshop AS workshop, -- nope class.review_mode AS review_mode, -- nope class.workshop_group AS workshop_group, -- nope location.name AS location_name, -- this location.address AS address, -- this location.city AS city, -- nope location.state AS state, -- this location.zip AS zip, -- nope region.id AS region, -- nope region.name AS region_name, -- nope region.sort_order AS region_sort, -- nope person.id AS person_id, UPPER( person.last_name || person.first_name ) AS instructor, -- this class.cutoff_message AS cutoff_message, -- this class.full_message AS full_message, -- this class.wait_description AS wait_description -- this FROM class, location, region, person, instructors WHERE class.location = location.id -- join with location AND class.id = instructors.class -- join the instructors AND instructors.person = person.id -- join the person(s) AND location.region = region.id; -- join the location to a region select id from bar where person_id = 84; drop view bar; id ------ 727 739 804 813 867 1244 (6 rows Comment out one column: -- class.full_message AS full_message, -- this class.wait_description AS wait_description -- this FROM class, location, region, person, instructors WHERE class.location = location.id -- join with location AND class.id = instructors.class -- join the instructors AND instructors.person = person.id -- join the person(s) AND location.region = region.id; -- join the location to a region select id from bar where person_id = 84; drop view bar; id ------ 727 739 804 813 867 1243 1244 (7 rows) It's always class.id 1243 that doesn't show up. Not sure this helps, but: moseley@bumby:~$ echo '\x \\ select * from class where id = 1243' | psql ws2 > 1243 moseley@bumby:~$ echo '\x \\ select * from class where id = 1244' | psql ws2 > 1244 moseley@bumby:~$ diff -U 0 1243 1244 --- 1243 2005-11-15 20:16:26.619412721 -0800 +++ 1244 2005-11-15 20:16:30.438646443 -0800 @@ -3 +3 @@ -id | 1243 +id | 1244 @@ -8,4 +8,4 @@ -class_time | 2005-12-12 07:00:00-08 -class_end_time | 2005-12-12 14:00:00-08 -class_size | 55 -begin_reg_time | 2005-11-15 17:36:00-08 +class_time | 2005-12-25 07:15:00-08 +class_end_time | 2005-12-25 11:00:00-08 +class_size | 33 +begin_reg_time | @@ -27 +27 @@ -register_cutoff_time | 2005-12-11 19:00:00-08 +register_cutoff_time | 2005-12-24 19:15:00-08 Thanks, -- Bill "stabbing in the dark" Moseley moseley@hank.org
Bill Moseley writes: > PostgreSQL 7.4.8 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.0.2 20050816 (prerelease) (Debian 4.0.1-5) > > Hopefully this is something simple -- I assume it's a problem with my > SQL. But it looks really weird to me at this late hour. > > I have some tables for managing workshops, and I have a view (below). > I noticed for a given select it was returning the wrong number of > rows. I expected seven rows back, but only got six. > > I rewrote the view with just the joins and it returned the correct number > of rows. So I started the brute force method of removing one thing at > a time in the view to see what would make it start returning the > correct number of rows. That just confused me more. How does the query plan change when you make those changes? If it only occurs if a certain index is used, it might be corrupt (=> REINDEX). regards, Andreas
On Wed, Nov 16, 2005 at 03:42:19PM +0100, Andreas Seltenreich wrote: > How does the query plan change when you make those changes? If it only > occurs if a certain index is used, it might be corrupt (=> REINDEX). I did a "reindex database ws2;" and no change. I'm not very good at reading the query plans. For one thing, they always send me off on some tangent wondering why it's doing a Seq Scan instead of a index scan. ;) The first plan below returns the correct number of rows, the second plan does not. These are after I did the reindex, btw. ws2=> explain select id from bar where person_id = 84; Subquery Scan bar (cost=1225.81..1243.32 rows=6 width=4) Filter: (person_id = 84) -> Unique (cost=1225.81..1230.82 rows=1000 width=334) -> Sort (cost=1225.81..1228.31 rows=1003 width=334) Sort Key: "class".id -> Hash Join (cost=802.15..1175.81 rows=1003 width=334) Hash Cond: ("outer".person = "inner".id) -> Hash Join (cost=67.50..203.81 rows=1003 width=315) Hash Cond: ("outer".region = "inner".id) -> Hash Join (cost=45.00..163.77 rows=1002 width=279) Hash Cond: ("outer"."location" = "inner".id) -> Hash Join (cost=22.50..118.74 rows=1001 width=141) Hash Cond: ("outer"."class" = "inner".id) -> Seq Scan on instructors (cost=0.00..20.00 rows=1000 width=8) -> Hash (cost=20.00..20.00 rows=1000 width=137) -> Seq Scan on "class" (cost=0.00..20.00 rows=1000 width=137) -> Hash (cost=20.00..20.00 rows=1000 width=142) -> Seq Scan on "location" (cost=0.00..20.00 rows=1000 width=142) -> Hash (cost=20.00..20.00 rows=1000 width=40) -> Seq Scan on region (cost=0.00..20.00 rows=1000 width=40) -> Hash (cost=649.12..649.12 rows=10212 width=23) -> Seq Scan on person (cost=0.00..649.12 rows=10212 width=23) (22 rows) This returns one row less and the only change to the view is this commented out column: -- class.full_message AS full_message, -- this ws2=> explain select id from bar where person_id = 84; Subquery Scan bar (cost=1222.54..1240.05 rows=6 width=4) Filter: (person_id = 84) -> Unique (cost=1222.54..1227.55 rows=1000 width=366) -> Sort (cost=1222.54..1225.05 rows=1003 width=366) Sort Key: "class".id -> Hash Join (cost=779.65..1172.54 rows=1003 width=366) Hash Cond: ("outer".person = "inner".id) -> Hash Join (cost=45.00..204.14 rows=1003 width=347) Hash Cond: ("outer".region = "inner".id) -> Hash Join (cost=22.50..164.10 rows=1002 width=311) Hash Cond: ("outer"."location" = "inner".id) -> Merge Join (cost=0.00..119.06 rows=1001 width=173) Merge Cond: ("outer".id = "inner"."class") -> Index Scan using class_pkey on "class" (cost=0.00..52.00 rows=1000 width=169) -> Index Scan using instructors_class_index on instructors (cost=0.00..52.00 rows=1000width=8) -> Hash (cost=20.00..20.00 rows=1000 width=142) -> Seq Scan on "location" (cost=0.00..20.00 rows=1000 width=142) -> Hash (cost=20.00..20.00 rows=1000 width=40) -> Seq Scan on region (cost=0.00..20.00 rows=1000 width=40) -> Hash (cost=649.12..649.12 rows=10212 width=23) -> Seq Scan on person (cost=0.00..649.12 rows=10212 width=23) (21 rows) -- Bill Moseley moseley@hank.org
Bill Moseley <moseley@hank.org> writes: > The first plan below returns the correct number of rows, the second plan does > not. These are after I did the reindex, btw. Bizarre. What are the datatypes of the columns being joined on? If they're string types, what's your database locale and encoding? regards, tom lane
On Wed, Nov 16, 2005 at 10:53:21AM -0500, Tom Lane wrote: > Bill Moseley <moseley@hank.org> writes: > > The first plan below returns the correct number of rows, the second plan does > > not. These are after I did the reindex, btw. > > Bizarre. What are the datatypes of the columns being joined on? If > they're string types, what's your database locale and encoding? The primary keys are all SERIAL, and the FKs are integer. Nothing too odd. The odd thing is the row that is not returned is basically a clone of another row -- which is why I diff'ed them in my first posting. BTW, this might be obvious, but the reason I'm doing DISTINCT ON class.id is that the instructors table is a link table and a class can have more than one instructor. I only want a list of classes, not one per instructor (which could duplicate them). I'm still a novice with Pg, so I assume this is what you are asking (although none of my joins are on text fields). ws2=> SHOW LC_CTYPE; -[ RECORD 1 ]--- lc_ctype | en_US ws2=> SHOW SERVER_ENCODING; -[ RECORD 1 ]---+------- server_encoding | LATIN1 So my joins are: WHERE class.location = location.id -- join with location AND class.id = instructors.class -- join the instructors AND instructors.person = person.id -- join the person(s) AND location.region = region.id; -- join the location to a region And the .id are all SERIAL integer and the FKs are all integer. Trying to avoid sending too much unnecessary data to the list, but here's a sample of the tables: ws2=> \d region Table "public.region" Column | Type | Modifiers ------------+---------+-------------------------------------------------------- id | integer | not null default nextval('public.region_id_seq'::text) active | boolean | not null default true sort_order | integer | not null default 1 name | text | not null Indexes: "region_pkey" primary key, btree (id) "region_name_key" unique, btree (nam ws2=> \d instructors Table "public.instructors" Column | Type | Modifiers --------+---------+----------- person | integer | not null class | integer | not null Indexes: "instructors_pkey" primary key, btree (person, "class") "instructors_class_index" btree ("class") "instructors_person_index" btree (person) Foreign-key constraints: "$1" FOREIGN KEY (person) REFERENCES person(id) "$2" FOREIGN KEY ("class") REFERENCES "class"(id) ws2=> \d class Table "public.class" Column | Type | Modifiers -------------------------+-----------------------------+------------------------------------------------------- id | integer | not null default nextval('public.class_id_seq'::text) name | text | not null old_id | integer | location | integer | not null workshop | integer | not null class_time | timestamp(0) with time zone | not null class_end_time | timestamp(0) with time zone | not null class_size | integer | not null begin_reg_time | timestamp(0) with time zone | class_list_sent_time | timestamp(0) with time zone | class_list_sent_email | text | reminder_sent_time | timestamp(0) with time zone | ride_list_sent_time | timestamp(0) with time zone | html_description | text | not null short_description | text | special_instructions | text | on_hold_message | text | review_mode | boolean | not null default false workshop_group | integer | not null distance_ed | boolean | not null default false contract_class | boolean | not null default false online_evaluation | boolean | not null default true price_scheme | integer | not null duration | text | register_cutoff_time | timestamp(0) with time zone | not null cutoff_message | text | full_message | text | wait_list_size | integer | wait_description | text | wait_instructions | text | wait_email_instructions | text | cancel_late_hours | integer | cancel_cutoff_hours | integer | cancel_email | text | send_confirmation | boolean | not null default true confirmed_change_notify | text | send_class_list_email | text | send_class_hours | integer | ride_list_hours | integer | reminder_hours | integer | notify_email | text | Indexes: "class_pkey" primary key, btree (id) "class_old_id_key" unique, btree (old_id) "class_class_time_index" btree (class_time) "class_old_id_index" btree (old_id) "class_workshop_index" btree (workshop) Foreign-key constraints: "$1" FOREIGN KEY ("location") REFERENCES "location"(id) "$2" FOREIGN KEY (workshop) REFERENCES workshop(id) "$3" FOREIGN KEY (workshop_group) REFERENCES workshop_group(id) "$4" FOREIGN KEY (price_scheme) REFERENCES pricing(id) ws2=> \d location Table "public.location" Column | Type | Modifiers ------------------+--------------+---------------------------------------------------------- id | integer | not null default nextval('public.location_id_seq'::text) old_id | integer | active | boolean | not null default true name | text | not null class_size | integer | not null num_workstations | integer | time_zone | integer | not null directions | text | not null region | integer | not null phone | text | address | text | city | text | zip | text | state | character(2) | map_link | text | Indexes: "location_pkey" primary key, btree (id) "location_name_key" unique, btree (name) "location_old_id_key" unique, btree (old_id) "location_old_id" btree (old_id) Foreign-key constraints: "$1" FOREIGN KEY (time_zone) REFERENCES timezone(id) "$2" FOREIGN KEY (region) REFERENCES region(id) "$3" FOREIGN KEY (state) REFERENCES geo_state(state) -- Bill Moseley moseley@hank.org
Bill Moseley <moseley@hank.org> writes: > On Wed, Nov 16, 2005 at 10:53:21AM -0500, Tom Lane wrote: >> Bizarre. What are the datatypes of the columns being joined on? If >> they're string types, what's your database locale and encoding? > The primary keys are all SERIAL, and the FKs are integer. Nothing too > odd. Well, that eliminates the theory I had, which was that string comparison was messing up because of incompatible locale/encoding choices. I think you may have found a PG bug. Can you duplicate the misbehavior if you dump the data and reload it into a fresh database? (Note you'll probably need to re-ANALYZE to get back to the same query plans.) If so, would you be willing to send me the dump off-list? If the data is not too sensitive, this'd probably be easier than trying to find a smaller test case. regards, tom lane
Bill Moseley <moseley@hank.org> writes: > [ strange behavior ] Oh, duh, it's not a PG bug: the problem is that the view is underspecified. You have SELECT DISTINCT ON (class.id) ... a bunch of stuff ... FROM ... a bunch of tables ... ORDER BY class.id; The difficulty with this is that DISTINCT ON will take the first row in each group with the same class.id. And since you're only sorting by class.id, "the first row" is ill-defined. I'm not sure why qsort's behavior seems to depend on the width of the rows, but there's no doubt that it's sorting different rows to the front of each group depending on which view you use. To get stable results from this view, what you need to do is add enough ORDER BY conditions to make sure you are getting a consistent "first row" in each group. Adding the primary keys of each of the tables would be enough, though it might be overkill. It could also be that you don't want to be using DISTINCT ON at all; have you thought through exactly what this view ought to produce for each class.id? regards, tom lane
On Wed, Nov 16, 2005 at 07:48:06PM -0500, Tom Lane wrote: > Bill Moseley <moseley@hank.org> writes: > > [ strange behavior ] > > Oh, duh, it's not a PG bug: the problem is that the view is > underspecified. You have > > SELECT DISTINCT ON (class.id) > ... a bunch of stuff ... > FROM ... a bunch of tables ... > ORDER BY class.id; > > The difficulty with this is that DISTINCT ON will take the first row in > each group with the same class.id. And since you're only sorting by > class.id, "the first row" is ill-defined. Sorry, but I fear I'm missing something. That ORDER BY is added by PG -- it's not part of my view when I define it. I assume PG adds that so it can do the DISTINCT ON. Still, I don't have any duplicate class.id rows in this select that I can see. class.id 1243 and 1244 are not the same, yet PG is (sometimes) throwing out one of them. Are you saying that somehow PG thinks they are the same class.id and is thus removing one? I'm asking for a list of all classes taught by instructor 84. ws2=> select * from instructors where person = 84 order by class; person | class --------+------- 84 | 727 84 | 739 84 | 804 84 | 813 84 | 867 84 | 1243 84 | 1244 (7 rows) The reason I'm using DISTINCT ON is because the "class_list" view is suppose to just return a list of unique classes, and a class might have more than one instructor which would result in extra rows -- as shown here: ws2=> select * from instructors where class in (select class from instructors where person = 84); person | class --------+------- 84 | 727 84 | 739 84 | 804 84 | 813 84 | 867 84 | 1243 26 | 1243 84 | 1244 26 | 1244 (9 rows) So when I don't want duplicates: ws2=> select distinct on (class) * from instructors where class in (select class from instructors where person = 84); person | class --------+------- 84 | 727 84 | 739 84 | 804 84 | 813 84 | 867 84 | 1243 84 | 1244 (7 rows) I don't care in this case about the DISTINCT ON throwing out the duplicates -- I just care about distinct classes, not that all the instructors are included in this select. And even if I throw in all my other joins I get the same thing: ws2=> SELECT DISTINCT ON (class.id) ws2-> class.id AS id ws2-> ws2-> FROM class, location, region, person, instructors ws2-> ws2-> WHERE class.location = location.id -- join with location ws2-> AND class.id = instructors.class -- join the instructors ws2-> AND instructors.person = person.id -- join the person(s) ws2-> AND location.region = region.id -- join the location to a region ws2-> AND person.id = 84; id ------ 727 739 804 813 867 1243 1244 (7 rows) > I'm not sure why qsort's > behavior seems to depend on the width of the rows, but there's no doubt > that it's sorting different rows to the front of each group depending > on which view you use. I just don't see what groups there are, though in this case. > It could also be that you don't want to be using DISTINCT ON at all; > have you thought through exactly what this view ought to produce for > each class.id? Yes, I think so. A list of columns related to it, with the exception of when there's duplicate instructors I want one of those duplicates thrown out (and I don't care which one). When I do a query that generates duplicate class.id's such as when a class has more than one instructor: ws2=> select class.id AS class_id, ws2-> person.id AS person_id ws2-> ws2-> FROM class, location, region, person, instructors ws2-> ws2-> WHERE class.location = location.id -- join with location ws2-> AND class.id = instructors.class -- join the instructors ws2-> AND instructors.person = person.id -- join the person(s) ws2-> AND location.region = region.id -- join the location to a region ws2-> AND class_time > now(); class_id | person_id ----------+----------- 561 | 95 614 | 95 747 | 111 762 | 111 772 | 111 883 | 13 924 | 26 935 | 26 945 | 26 1243 | 84 1243 | 26 1244 | 84 1244 | 26 (13 rows) You can see some classes are listed twice, so using distinct on gets just my list of unique classes: ws2=> SELECT DISTINCT ON (class.id) ws2-> class.id AS class_id, ws2-> person.id AS person_id ws2-> ws2-> FROM class, location, region, person, instructors ws2-> ws2-> WHERE class.location = location.id -- join with location ws2-> AND class.id = instructors.class -- join the instructors ws2-> AND instructors.person = person.id -- join the person(s) ws2-> AND location.region = region.id -- join the location to a region ws2-> AND class_time > now(); class_id | person_id ----------+----------- 561 | 95 614 | 95 747 | 111 762 | 111 772 | 111 883 | 13 924 | 26 935 | 26 945 | 26 1243 | 84 1244 | 84 (11 rows) All the view does is include more columns in the result set, and that seems to change the number of results. Here's the view doing the previous query based on "class_time": ws2=> select id, person_id from class_list where class_time > now(); id | person_id ------+----------- 561 | 95 614 | 95 747 | 111 762 | 111 772 | 111 883 | 13 924 | 26 935 | 26 945 | 26 1243 | 26 << PG selected the other instructor this time 1244 | 84 (11 rows) FYI - For those following along, none of those queries show the actual problem I'm having with the view. Which is the view of the above join is returning the wrong number of rows -- 1243 is missing. Again, selecting by person_id: ws2=> select id from class_list where person_id = 84; id ------ 727 739 804 813 867 1244 (6 rows) I have a feeling this is going to be a "doh!" when I finally see it... -- Bill Moseley moseley@hank.org
Bill Moseley <moseley@hank.org> writes: > That ORDER BY is added by PG -- it's not part of my view when I define > it. I assume PG adds that so it can do the DISTINCT ON. Well, then you're even further from following the protocol for DISTINCT ON. You *must* provide an ORDER BY to get reliable results from it. > Still, I don't have any duplicate class.id rows in this select that I > can see. class.id 1243 and 1244 are not the same, yet PG is > (sometimes) throwing out one of them. Are you saying that somehow PG > thinks they are the same class.id and is thus removing one? No, I'm saying that the underlying data (the join result before applying DISTINCT ON) looks like this: bill=# select "class".id, person.id AS person_id bill-# FROM "class", "location", region, person, instructors bill-# WHERE "class"."location" = "location".id AND "class".id = instructors."class" bill-# AND instructors.person = person.id AND "location".region = region.id bill-# ORDER BY "class".id; id | person_id ------+----------- 1 | 49 2 | 27 3 | 19 4 | 82 5 | 12 ... 1238 | 61 1238 | 60 1239 | 40 1240 | 67 1241 | 11 1243 | 26 1243 | 84 1244 | 26 1244 | 84 (1311 rows) The DISTINCT ON will take just one of the two rows with id = 1243, and just one of the rows with id = 1244, and *it is effectively random which one gets picked*. So when you then select rows with person_id = 84, you may or may not see these rows in the end result. > The reason I'm using DISTINCT ON is because the "class_list" view is > suppose to just return a list of unique classes, and a class might have > more than one instructor which would result in extra rows -- as shown > here: Exactly. So your view is going to return the class id along with a randomly selected one of the instructor ids. It seems to me that filtering this result on instructor id is perhaps a bit ill-advised, even if you fix the view so that the chosen instructor id isn't so random (eg, you could fix it to display the lowest-numbered instructor id for the particular class). Even then, are you searching for the instructor id that the view happens to show for that class, or some other one? regards, tom lane
On Wed, Nov 16, 2005 at 11:34:21PM -0500, Tom Lane wrote: > No, I'm saying that the underlying data (the join result before applying > DISTINCT ON) looks like this: > > bill=# select "class".id, person.id AS person_id > bill-# FROM "class", "location", region, person, instructors > bill-# WHERE "class"."location" = "location".id AND "class".id = instructors."class" > bill-# AND instructors.person = person.id AND "location".region = region.id > bill-# ORDER BY "class".id; > id | person_id > ------+----------- > 1 | 49 > 2 | 27 > 3 | 19 > 4 | 82 > 5 | 12 > ... > 1238 | 61 > 1238 | 60 > 1239 | 40 > 1240 | 67 > 1241 | 11 > 1243 | 26 > 1243 | 84 > 1244 | 26 > 1244 | 84 > (1311 rows) > > The DISTINCT ON will take just one of the two rows with id = 1243, and > just one of the rows with id = 1244, and *it is effectively random which > one gets picked*. So when you then select rows with person_id = 84, you > may or may not see these rows in the end result. Yikes! The problem is *when* DISTINCT ON happens, right? And, Tom, you actually explained this to me on the list back on Aug 25th, but that's when I was using the view in a different way. You noted that the order was "unpredictable" but at that time it didn't matter which row was selected to me. http://archives.postgresql.org/pgsql-general/2005-08/msg01291.php This current problem was due to my assumption of how PG executes the query: My assumption was that the select would first do the joins (including limit by class.id = 84) *then* weed out the duplicate class.ids. But if PG is first doing the the joins on all the tables (before limiting by class.id = 84) and then weeding out the duplicate class.ids, and then finally limiting by class.id = 84 then I can see where I might end up wit the missing row. Frankly, I expected the first to happen because it would use an index to select just the records of class.id = 84, then do the joins on that small set of records. Didn't seem likely that the database would join all the records first and then limit by class.id. Seems like the hard way to do the query. But the query planner works in strange and mysterious ways. ;) Does that also explain why PG was sometimes returning the "correct" number of rows? Depending on which of the two query plans above were used? > Exactly. So your view is going to return the class id along with a > randomly selected one of the instructor ids. It seems to me that > filtering this result on instructor id is perhaps a bit ill-advised, > even if you fix the view so that the chosen instructor id isn't so > random (eg, you could fix it to display the lowest-numbered instructor > id for the particular class). Even then, are you searching for the > instructor id that the view happens to show for that class, or some > other one? Well, clearly, my "one-size-fits-all view" doesn't work in this case. I just need another view without distinct when limiting by instructor. It was that red-herring of removing a seemingly random column from the view that made it hard to see what was really happening. Thanks very much for all your time. -- Bill Moseley moseley@hank.org