Thread: Weird (?) problem with order of conditions in SELECT
Hello all, I have encountered a weird problem I can't seem to understand. It involves a correlated subquery, where the rows returned seem to depend upon the order I specify my conditions. I can't see why the order should be important (except maybe for performance) To demonstrate I created a toy data set and query (note, both are completely contrived so neither really 'makes sense' beyond this context). I present first two queries which return different results, but differ *only in the order I specify conditions*, along with the results of the queries. (The toy data set is at the very end of the email) SELECT c.score FROM c WHERE c.score >= (SELECT MAX(score) AS score FROM c WHERE a.name='nugget' AND a.job='programmer' AND a.a_id=b.a_id AND c.b_id = b.b_id) AND a.name='nugget' AND a.job='programmer' AND a.a_id=b.a_id AND c.b_id = b.b_id; score ------- 2500 (1 row) * NOTE, this query returns only one row. SELECT c.score FROM c WHERE a.name='nugget' AND a.job='programmer' AND a.a_id=b.a_id AND c.b_id = b.b_id AND c.score >= (SELECT MAX(score) AS score FROM c WHERE a.name='nugget' AND a.job='programmer' AND a.a_id=b.a_id AND c.b_id = b.b_id); score ------- 100 2500 (2 rows) * NOTE, now TWO rows are returned, even though all that has happened is I've changed the order of the conditions. What's going on here ? Why should the order of conditions be important ?? Any insight would be greatly appreciated. Thanks, terry CREATE TABLE people (name text not null PRIMARY KEY, age int); insert into people values ('nugget', 33); insert into people values ('lisa', 32); insert into people values ('larry', 28); CREATE TABLE a (a_id INT NOT NULL PRIMARY KEY, job TEXT NOT NULL, name TEXT NOT NULL, FOREIGN KEY (name) REFERENCES people(name)); INSERT INTO a VALUES (1, 'programmer', 'nugget'); INSERT INTO a VALUES (2, 'programmer', 'lisa'); INSERT INTO a VALUES (2, 'secretary', 'lisa'); INSERT INTO a VALUES (3, 'secretary', 'lisa'); INSERT INTO a VALUES (4, 'student', 'larry'); INSERT INTO a VALUES (5, 'programmer', 'nugget'); CREATE TABLE b (b_id INT NOT NULL PRIMARY KEY, a_id INT NOT NULL, FOREIGN KEY (a_id) REFERENCES a(a_id)); INSERT INTO b (a_id, b_id) VALUES (1,1); INSERT INTO b (a_id, b_id) VALUES (2,2); INSERT INTO b (a_id, b_id) VALUES (3,3); INSERT INTO b (a_id, b_id) VALUES (4,4); INSERT INTO b (a_id, b_id) VALUES (5,5); CREATE TABLE c (c_id INT NOT NULL PRIMARY KEY, b_id INT NOT NULL, score INT NOT NULL, FOREIGN KEY (b_id) REFERENCES b(b_id)); INSERT INTO c (b_id, c_id, score) VALUES (1,1,100); INSERT INTO c (b_id, c_id, score) VALUES (2,2,400); INSERT INTO c (b_id, c_id, score) VALUES (3,3,900); INSERT INTO c (b_id, c_id, score) VALUES (4,4,1600); INSERT INTO c (b_id, c_id,score) VALUES (5,5,2500);
"Mark, Terry" <tmark@amgen.com> writes: > I have encountered a weird problem I can't seem to understand. It involves > a correlated subquery, where the rows returned seem to depend upon the order > I specify my conditions. I can't see why the order should be important > (except maybe for performance) > SELECT c.score FROM c > WHERE c.score >= (SELECT MAX(score) AS score FROM c > WHERE a.name='nugget' > AND a.job='programmer' > AND a.a_id=b.a_id AND c.b_id = b.b_id) > AND a.name='nugget' > AND a.job='programmer' > AND a.a_id=b.a_id > AND c.b_id = b.b_id; > SELECT c.score FROM c > WHERE a.name='nugget' > AND a.job='programmer' > AND a.a_id=b.a_id > AND c.b_id = b.b_id > AND c.score >= (SELECT MAX(score) AS score FROM c > WHERE a.name='nugget' > AND a.job='programmer' > AND a.a_id=b.a_id > AND c.b_id = b.b_id); This is a little less mysterious if you run it under 7.1, because 7.1 emits some warning notices: NOTICE: Adding missing FROM-clause entry in subquery for table "a" NOTICE: Adding missing FROM-clause entry in subquery for table "b" NOTICE: Adding missing FROM-clause entry for table "a" NOTICE: Adding missing FROM-clause entry for table "b" score ------- 2500 (1 row) NOTICE: Adding missing FROM-clause entry for table "a" NOTICE: Adding missing FROM-clause entry for table "b" score ------- 100 2500 (2 rows) From this we can infer that Postgres is actually interpreting the first query as SELECT c.score FROM a,b,c WHERE c.score >= (SELECT MAX(score) AS score FROM a,b,c WHERE a.name='nugget' AND a.job='programmer' AND a.a_id=b.a_id AND c.b_id = b.b_id) AND a.name='nugget' AND a.job='programmer' AND a.a_id=b.a_id AND c.b_id = b.b_id; whereas the second one is being interpreted as SELECT c.score FROM a,b,c WHERE a.name='nugget' AND a.job='programmer' AND a.a_id=b.a_id AND c.b_id = b.b_id AND c.score >= (SELECT MAX(score) AS score FROM c WHERE a.name='nugget' AND a.job='programmer' AND a.a_id=b.a_id AND c.b_id = b.b_id); That is, in the second case the sub-select's references to A and B are being taken as outer references to the current A and B rows of the outer query, whereas in the first case the sub-select is interpreted as a completely independent query. I am not sure which interpretation you were actually intending. This example shows one reason why the "implicit FROM item" feature of Postgres is confusing and has come to be deprecated: it's not always clear which FROM list an implicit item should be added to. We've started to emit a warning about use of this feature in 7.1, and perhaps someday it will be removed entirely. regards, tom lane
Many thanks to Tom for his reply. My intention was to have the subquery treated as a completely independent query. Should I understand, then, that by explicitly naming all the involved table portions in the subequery, that the subquery is guaranteed to be treated independently ? terry -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, March 20, 2001 11:42 AM To: Mark, Terry Cc: pgsql-novice@postgresql.org Subject: Re: [NOVICE] Weird (?) problem with order of conditions in SELECT "Mark, Terry" <tmark@amgen.com> writes: > I have encountered a weird problem I can't seem to understand. It involves > a correlated subquery, where the rows returned seem to depend upon the order > I specify my conditions. I can't see why the order should be important > (except maybe for performance) > SELECT c.score FROM c > WHERE c.score >= (SELECT MAX(score) AS score FROM c > WHERE a.name='nugget' > AND a.job='programmer' > AND a.a_id=b.a_id AND c.b_id = b.b_id) > AND a.name='nugget' > AND a.job='programmer' > AND a.a_id=b.a_id > AND c.b_id = b.b_id; > SELECT c.score FROM c > WHERE a.name='nugget' > AND a.job='programmer' > AND a.a_id=b.a_id > AND c.b_id = b.b_id > AND c.score >= (SELECT MAX(score) AS score FROM c > WHERE a.name='nugget' > AND a.job='programmer' > AND a.a_id=b.a_id > AND c.b_id = b.b_id); This is a little less mysterious if you run it under 7.1, because 7.1 emits some warning notices: NOTICE: Adding missing FROM-clause entry in subquery for table "a" NOTICE: Adding missing FROM-clause entry in subquery for table "b" NOTICE: Adding missing FROM-clause entry for table "a" NOTICE: Adding missing FROM-clause entry for table "b" score ------- 2500 (1 row) NOTICE: Adding missing FROM-clause entry for table "a" NOTICE: Adding missing FROM-clause entry for table "b" score ------- 100 2500 (2 rows) From this we can infer that Postgres is actually interpreting the first query as SELECT c.score FROM a,b,c WHERE c.score >= (SELECT MAX(score) AS score FROM a,b,c WHERE a.name='nugget' AND a.job='programmer' AND a.a_id=b.a_id AND c.b_id = b.b_id) AND a.name='nugget' AND a.job='programmer' AND a.a_id=b.a_id AND c.b_id = b.b_id; whereas the second one is being interpreted as SELECT c.score FROM a,b,c WHERE a.name='nugget' AND a.job='programmer' AND a.a_id=b.a_id AND c.b_id = b.b_id AND c.score >= (SELECT MAX(score) AS score FROM c WHERE a.name='nugget' AND a.job='programmer' AND a.a_id=b.a_id AND c.b_id = b.b_id); That is, in the second case the sub-select's references to A and B are being taken as outer references to the current A and B rows of the outer query, whereas in the first case the sub-select is interpreted as a completely independent query. I am not sure which interpretation you were actually intending. This example shows one reason why the "implicit FROM item" feature of Postgres is confusing and has come to be deprecated: it's not always clear which FROM list an implicit item should be added to. We've started to emit a warning about use of this feature in 7.1, and perhaps someday it will be removed entirely. regards, tom lane