Thread: BUG #5898: Nested "in" clauses hide bad column names
The following bug has been logged online: Bug reference: 5898 Logged by: Scott Dunbar Email address: scott@xigole.com PostgreSQL version: 9.0.3 Operating system: Ubuntu 10.10 Description: Nested "in" clauses hide bad column names Details: I have a nested in clause like: select respondent_id from respondent where respondent_id in (select respondent_id from chat_session where project_id in (select project_id from project where company_id = 4)); However, in this example, there is no column named respondent_id in the chat_session table. But the query runs and, indeed, returns all of the rows in respondent. Since this was then part of another nested in it deleted far more that it was supposed to.
"Scott Dunbar" <scott@xigole.com> writes: > I have a nested in clause like: > select respondent_id from respondent where respondent_id in (select > respondent_id from chat_session where project_id in (select project_id from > project where company_id = 4)); > However, in this example, there is no column named respondent_id in the > chat_session table. Probably there is one in respondent, though? This behavior is not a bug --- what you have there is an outer reference, and it is working exactly as specified by the SQL standard. Sub-selects would be a whole lot less useful if they couldn't refer to variables of the outer query. regards, tom lane
Yes, you're correct. I guess this makes sense but it does seem strange that I can enter garbage in a query but it still runs. And in my case the output from this (the entire table) was then used in a delete statement that toasted the entire table. Allowing bogus SQL just seems "wrong" but I do understand what's going on. Thanks for your help. On 02/22/2011 10:45 AM, Tom Lane wrote: > "Scott Dunbar"<scott@xigole.com> writes: >> I have a nested in clause like: >> select respondent_id from respondent where respondent_id in (select >> respondent_id from chat_session where project_id in (select project_id from >> project where company_id = 4)); >> However, in this example, there is no column named respondent_id in the >> chat_session table. > Probably there is one in respondent, though? This behavior is not a bug > --- what you have there is an outer reference, and it is working exactly > as specified by the SQL standard. Sub-selects would be a whole lot less > useful if they couldn't refer to variables of the outer query. > > regards, tom lane -- Scott Dunbar Xigole Systems, Inc. Enterprise software consulting, development, and hosting 303·667·6343
Scott Dunbar <scott@xigole.com> wrote: > I guess this makes sense but it does seem strange that I can enter > garbage in a query but it still runs. It wasn't garbage. > And in my case the output from this (the entire table) was then > used in a delete statement that toasted the entire table. I can suggest a few ways to protect yourself in such situations. (1) I generally run any DELETE statement against data I care about as a SELECT first. It's generally pretty easy to write it as a SELECT tbl.* FROM which can be converted to DELETE FROM tbl after reviewing what matches. (2) When in doubt, use BEGIN; before running the statement. You can review the count, run SELECTs to look at the results, etc., before running COMMIT; to make it "stick". (3) Less convenient, but sometimes useful, is to EXPLAIN your query before actually running it. If you do that with your delete, you'll see the criterion applied to a table other than what you were expecting, which might alert you to the problem. The estimated row count at the top level of the plan might be another red flag. > Allowing bogus SQL just seems "wrong" but I do understand what's > going on. If you did you wouldn't call a well formed, unambiguous, standard- conforming statement bogus. It did exactly what you said; just not what you meant. It pays to be a bit paranoid when running ad hoc DML in case you accidentally don't say what you mean. -Kevin
Using table aliases prevents such problems by disambiguating the column names. For example, if you had table aliases in just one part of this query as in the below, you would have gotten an error instead of deleting all those rows: select respondent_id from respondent where respondent_id in (select cs.respondent_id from chat_session cs where cs.project_id in (select project_id from project where company_id = 4)); Cheers, Eric On 2/22/2011 1:07 PM, Scott Dunbar wrote: > Yes, you're correct. I guess this makes sense but it does seem > strange that I can enter garbage in a query but it still runs. And in > my case the output from this (the entire table) was then used in a > delete statement that toasted the entire table. Allowing bogus SQL > just seems "wrong" but I do understand what's going on. > > Thanks for your help. > > > On 02/22/2011 10:45 AM, Tom Lane wrote: >> "Scott Dunbar"<scott@xigole.com> writes: >>> I have a nested in clause like: >>> select respondent_id from respondent where respondent_id in (select >>> respondent_id from chat_session where project_id in (select project_id from >>> project where company_id = 4)); >>> However, in this example, there is no column named respondent_id in the >>> chat_session table. >> Probably there is one in respondent, though? This behavior is not a bug >> --- what you have there is an outer reference, and it is working exactly >> as specified by the SQL standard. Sub-selects would be a whole lot less >> useful if they couldn't refer to variables of the outer query. >> >> regards, tom lane > > > -- > Scott Dunbar > Xigole Systems, Inc. > Enterprise software consulting, development, and hosting > 303·667·6343