Thread: No error when FROM is missing in subquery
Following query is considered as correct, no "missing from" error has been reported (so, entire table will be updated and "on update" triggers will be fired for every row): update item set obj_id = obj_id where obj_id in (select obj_id where item_point is null order by obj_modified limit 10) Is it a bug? If no, maybe to produce warning in such cases? -- Best regards, Nikolay
ok, sorry, I've realized that it's yet another example of "outer reference", Tom will say "read any SQL book" again :-) http://archives.postgresql.org/pgsql-bugs/2006-12/msg00115.php On 12/19/06, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote: > Following query is considered as correct, no "missing from" error has > been reported (so, entire table will be updated and "on update" > triggers will be fired for every row): > > update item set obj_id = obj_id > where obj_id in (select obj_id where item_point is null order by > obj_modified limit 10) > > Is it a bug? If no, maybe to produce warning in such cases? > > -- > Best regards, > Nikolay > -- Best regards, Nikolay
> On 12/19/06, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote: > > Following query is considered as correct, no "missing from" error has > > been reported (so, entire table will be updated and "on update" > > triggers will be fired for every row): > > > > update item set obj_id = obj_id > > where obj_id in (select obj_id where item_point is null order by > > obj_modified limit 10) > > > > Is it a bug? If no, maybe to produce warning in such cases? > On 12/18/06, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote: > ok, sorry, I've realized that it's yet another example of "outer > reference", Tom will say "read any SQL book" again :-) > > http://archives.postgresql.org/pgsql-bugs/2006-12/msg00115.php > not really... AFAIK, the FROM clause is mandatory per SQL... older releases of postgres fill the missing from clause if it was easy to determine, in recent releases it's mandatory unless you specify the opposite in postgresql.conf with the add_missing_from parameter -- regards, Jaime Casanova "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs and the universe trying to produce bigger and better idiots. So far, the universe is winning." Richard Cook
>> > Is it a bug? If no, maybe to produce warning in such cases? oups. just thumbled over this as well when i forgot a FROM in a WHERE ... IN (....) and damaged quite some data. the bad query went like this: SELECT * FROM movies.names WHERE mov_id IN (SELECT DISTINCT mov_id WHERE mov_name like '%, %' LIMIT 2) the subselect is missing a FROM <table>. in that case, pgsql seemed to also ignore the LIMIT 2 and returned 3706 records out of ~130000... no clue which ones :-/ - thomas
On 12/18/06, Thomas H. <me@alternize.com> wrote: > >> > Is it a bug? If no, maybe to produce warning in such cases? > > oups. just thumbled over this as well when i forgot a FROM in a WHERE ... IN > (....) and damaged quite some data. the bad query went like this: > > SELECT * FROM movies.names WHERE mov_id IN (SELECT DISTINCT mov_id WHERE > mov_name like '%, %' LIMIT 2) > > the subselect is missing a FROM <table>. in that case, pgsql seemed to also > ignore the LIMIT 2 and returned 3706 records out of ~130000... and the UPDATE was? also the limit applies only to the subselect, it has nothing to do with the upper query so the upper query can return more than number of rows specified in the subselect... > no clue which ones :-/ > LIMIT is often meaningfull only in conjuction with ORDER BY -- regards, Jaime Casanova "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs and the universe trying to produce bigger and better idiots. So far, the universe is winning." Richard Cook
>> oups. just thumbled over this as well when i forgot a FROM in a WHERE ... >> IN >> (....) and damaged quite some data. the bad query went like this: >> >> SELECT * FROM movies.names WHERE mov_id IN (SELECT DISTINCT mov_id WHERE >> mov_name like '%, %' LIMIT 2) >> >> the subselect is missing a FROM <table>. in that case, pgsql seemed to >> also >> ignore the LIMIT 2 and returned 3706 records out of ~130000... > > and the UPDATE was? that was done by the application with the returned recordset. > also the limit applies only to the subselect, it has nothing to do > with the upper query so the upper query can return more than number of > rows specified in the subselect... IF the subquery would only have returned 2 ids, then there would be at most like +/-10 records affected. each mov_id can hold one or more (usuals up to 5) names. but here, the subquery seemed to return ~3700 distinct mov_ids, thus around 37000 names where damaged by the following programmatical updates instead of only a hands full... > LIMIT is often meaningfull only in conjuction with ORDER BY yep but not here. all i wanted to do is to get names from 2 movies and run an *observed* edit on them. what did pgsql actually do with that subquery? did it return all records for which mov_name match '%, %'? - thomas
On 12/18/06, Thomas H. <me@alternize.com> wrote: > >> oups. just thumbled over this as well when i forgot a FROM in a WHERE ... > >> IN > >> (....) and damaged quite some data. the bad query went like this: > >> > >> SELECT * FROM movies.names WHERE mov_id IN (SELECT DISTINCT mov_id WHERE > >> mov_name like '%, %' LIMIT 2) > >> > >> the subselect is missing a FROM <table>. in that case, pgsql seemed to > >> also > >> ignore the LIMIT 2 and returned 3706 records out of ~130000... > > > > and the UPDATE was? > > that was done by the application with the returned recordset. > > > also the limit applies only to the subselect, it has nothing to do > > with the upper query so the upper query can return more than number of > > rows specified in the subselect... > > IF the subquery would only have returned 2 ids, then there would be at most > like +/-10 records affected. each mov_id can hold one or more (usuals up to > 5) names. but here, the subquery seemed to return ~3700 distinct mov_ids, > thus around 37000 names where damaged by the following programmatical > updates instead of only a hands full... > have you tested the query in psql? what results do you get? -- regards, Jaime Casanova "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs and the universe trying to produce bigger and better idiots. So far, the universe is winning." Richard Cook
>> >> SELECT * FROM movies.names WHERE mov_id IN (SELECT DISTINCT mov_id >> >> WHERE >> >> mov_name like '%, %' LIMIT 2) >> >> IF the subquery would only have returned 2 ids, then there would be at >> most >> like +/-10 records affected. each mov_id can hold one or more (usuals up >> to >> 5) names. but here, the subquery seemed to return ~3700 distinct mov_ids, >> thus around 37000 names where damaged by the following programmatical >> updates instead of only a hands full... >> > > have you tested the query in psql? > what results do you get? the data is damaged so the result isn't the same... regenearting it now from a backup. from first tests i would say it returned records with names that match the WHERE in the subselect. i guess what happened is: it took each record in movies.names, then run the subquery for that record which resulted in "WHERE mov_id IN (mov_id)" = true for records with a ', ' in the name and "WHERE mov_id IN ()" = false for all others. - thomas
"Thomas H." <me@alternize.com> writes: > SELECT * FROM movies.names WHERE mov_id IN (SELECT DISTINCT mov_id WHERE > mov_name like '%, %' LIMIT 2) > the subselect is missing a FROM <table>. in that case, pgsql seemed to also > ignore the LIMIT 2 It didn't "ignore" anything. Each execution of the sub-select returned 1 row, containing the current mov_id from the outer query. So basically this would've selected everything passing the LIKE condition. regards, tom lane
Also check that the mov_id column exists in the table/view that you are running the SELECT DISTINCT against. Pgsql does not throw an error (at least prior to 8.2) if the column referenced by the select statement for the IN clause does not exist. It will run only SELECT * FROM movies.names in this case. Mike On Tue, 2006-12-19 at 06:01 +0100, Thomas H. wrote: > >> >> SELECT * FROM movies.names WHERE mov_id IN (SELECT DISTINCT mov_id > >> >> WHERE > >> >> mov_name like '%, %' LIMIT 2) > >> > >> IF the subquery would only have returned 2 ids, then there would be at > >> most > >> like +/-10 records affected. each mov_id can hold one or more (usuals up > >> to > >> 5) names. but here, the subquery seemed to return ~3700 distinct mov_ids, > >> thus around 37000 names where damaged by the following programmatical > >> updates instead of only a hands full... > >> > > > > have you tested the query in psql? > > what results do you get? > > the data is damaged so the result isn't the same... regenearting it now from > a backup. > > from first tests i would say it returned records with names that match the > WHERE in the subselect. i guess what happened is: it took each record in > movies.names, then run the subquery for that record which resulted in "WHERE > mov_id IN (mov_id)" = true for records with a ', ' in the name and "WHERE > mov_id IN ()" = false for all others. > > - thomas > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend
mike <mike@thegodshalls.com> writes: > Pgsql does not throw an error (at least prior to 8.2) if the column > referenced by the select statement for the IN clause does not exist. My, there's a lot of misinformation in this thread. The reason there's no error thrown is that the reference to mov_id in the sub-SELECT is a perfectly legal outer reference to the mov_id column available from the upper SELECT. If the column truly did not exist anywhere in the tables used in the query, it would have thrown an error. regards, tom lane
You are right as usual.... My apologies yet again. I have wrongly assumed that the lower statement would run first since it is enclosed in parenthesis. On Tue, 2006-12-19 at 00:48 -0500, Tom Lane wrote: > mike <mike@thegodshalls.com> writes: > > Pgsql does not throw an error (at least prior to 8.2) if the column > > referenced by the select statement for the IN clause does not exist. > > My, there's a lot of misinformation in this thread. > > The reason there's no error thrown is that the reference to mov_id in > the sub-SELECT is a perfectly legal outer reference to the mov_id column > available from the upper SELECT. If the column truly did not exist > anywhere in the tables used in the query, it would have thrown an error. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
>>> On Mon, Dec 18, 2006 at 11:48 PM, in message <15735.1166507302@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > The reason there's no error thrown is that the reference to mov_id in > the sub- SELECT is a perfectly legal outer reference to the mov_id column > available from the upper SELECT. If the column truly did not exist > anywhere in the tables used in the query, it would have thrown an error. It's easy to see how it resolves the column references; but the syntax is still not compliant with the SQL standard. The FROM clause is required. We could call it a PostgreSQL extension, but I'm curious if anyone actually finds it useful. So far it just seems to provide an opportunity for error. -Kevin *scalar_subquery : subquery ; *subquery : LEFT_PAREN query_expression RIGHT_PAREN ; *query_expression : non_join_query_expression | joined_table ; *non_join_query_expression : non_join_query_term | query_expression TK_union [ TK_all ] [ corresponding_spec ] query_term | query_expression TK_except [ TK_all ] [ corresponding_spec ] query_term ; *non_join_query_term : non_join_query_primary | query_term TK_intersect [ TK_all ] [ corresponding_spec ] query_primary ; *non_join_query_primary : simple_table | LEFT_PAREN non_join_query_expression RIGHT_PAREN ; *simple_table : query_specification | table_value_constructor | explicit_table ; *query_specification : TK_select [ set_quantifier ] select_list table_expression ; *select_list : ASTERISK | select_sublist [ { COMMA select_sublist }... ] ; *select_sublist : derived_column | qualifier PERIOD ASTERISK ; *derived_column : value_expression [ as_clause ] ; *as_clause : [ TK_as ] column_name ; *table_expression : from_clause [ where_clause ] [ group_by_clause ] [ having_clause ] ; *from_clause : TK_from table_reference [ { COMMA table_reference }... ] ;
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > It's easy to see how it resolves the column references; but the syntax > is still not compliant with the SQL standard. The FROM clause is > required. We could call it a PostgreSQL extension, but I'm curious if > anyone actually finds it useful. You've never done "SELECT some_expression" ? regards, tom lane
>>> On Tue, Dec 19, 2006 at 9:23 AM, in message <24047.1166541794@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> It's easy to see how it resolves the column references; but the syntax >> is still not compliant with the SQL standard. The FROM clause is >> required. We could call it a PostgreSQL extension, but I'm curious if >> anyone actually finds it useful. > > You've never done "SELECT some_expression" ? Not, as far as I can recall, in the ANSI SQL spec. Did I miss something? I'm having trouble seeing how it is a useful construct in the context of a scalar subquery. A non-standard extension should be useful in some way. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > I'm having trouble seeing how it is a useful construct in the context > of a scalar subquery. A non-standard extension should be useful in some > way. There is 0 chance that we'd disallow it at the top level after allowing it all these years. And probably not even just top-level; consider select 1 union all select 2 union all select 3; which has been the recommended workaround up to 8.2 for our lack of multi-row VALUES lists. We will certainly break a lot of code if we disallow that. So now you have to make a case why we should make a non-orthogonal distinction between certain subqueries and other subqueries. As for potential usefulness, consider a set-returning function invoked in the targetlist: it makes perfect sense to do WHERE foo IN (SELECT mysrf(...)) and maybe even add an ORDER BY/LIMIT to that. Yeah, no doubt this is easy to change to putting the SRF in FROM, but you can bet there are Postgres applications out there today using it; we have never officially deprecated this way of using an SRF. I also recall having advised people in years past to use dummy (SELECT expr) sub-selects to work around planning issues. I don't recall at the moment whether any of those issues are still live, but again you can bet the construct is still in some applications' SQL. So basically we get to introduce some arbitrary non-orthogonality, plus break an unknown number of applications, in order to block off a type of user error that AFAIR has never been complained of before in all the years I've been working with Postgres. Sorry, no sale ... regards, tom lane
>>> On Tue, Dec 19, 2006 at 9:58 AM, in message <26199.1166543928@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> I'm having trouble seeing how it is a useful construct in the context >> of a scalar subquery. A non- standard extension should be useful in some >> way. > > There is 0 chance that we'd disallow it at the top level after allowing > it all these years. I wouldn't want to eliminate it there -- it is clearly a useful extension to the standard at the top level. > And probably not even just top- level; consider > select 1 union all select 2 union all select 3; > which has been the recommended workaround up to 8.2 for our lack of > multi- row VALUES lists. We will certainly break a lot of code if we > disallow that. Point taken. > So now you have to make a case why we should make a > non- orthogonal distinction between certain subqueries and other > subqueries. Well, I don't think of the terms for set operations as subqueries, and there are other differences already in what is allowed for a query term and a subquery. Arguably there is more risk of error of the type recently reported where you are in a scalar subquery context. > As for potential usefulness, consider a set- returning function invoked > in the targetlist: it makes perfect sense to do > WHERE foo IN (SELECT mysrf(...)) > and maybe even add an ORDER BY/LIMIT to that. That is sufficient to answer my concerns. I tend to operate from the context of the standard, because we have our own ANSI based parser which generates portable Java query classes. ORDER BY and LIMIT are not allowed in the subqueries in the standard but are obviously useful extensions. The missing FROM then adds value to the other extensions. Case closed. Thanks. By the way, when I read my previous message it struck me that it could be taken with a tone I didn't intend. That was the result of whipping it out quickly without taking sufficient time to review it. Sorry; no offense was intended. I'll try to avoid doing that again. -Kevin