Thread: BUG #12276: Using old name of a renamed or dropped column in a sub-query does not throw error

BUG #12276: Using old name of a renamed or dropped column in a sub-query does not throw error

From
collin.peters@gmail.com
Date:
The following bug has been logged on the website:

Bug reference:      12276
Logged by:          Collin Peters
Email address:      collin.peters@gmail.com
PostgreSQL version: 9.1.14
Operating system:   Debian Squeeze 6.0.10
Description:

Please see SQL below. We just had this happen in our production system,
thankfully in a non-serious scenario.

Basically if you rename a column, and then use the original name in a
subquery, that sub-query does not complain that the column no longer exists,
and seems to complete ok. This will cause whatever SELECT or DELETE that is
using the sub-query to continue on its merry way and cause havoc.

---------------------------------------
SELECT version();

CREATE TABLE orders
(
  order_id integer NOT NULL,
  name text NOT NULL,
  CONSTRAINT orders_pk PRIMARY KEY (order_id)
);

CREATE TABLE order_lines
(
  order_lines_id integer NOT NULL,
  order_id integer NOT NULL,
  value text NOT NULL,
  CONSTRAINT order_lines_pk PRIMARY KEY (order_lines_id),
  CONSTRAINT orders_fk FOREIGN KEY (order_id)
      REFERENCES orders (order_id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE NO ACTION
);

CREATE TABLE users
(
  user_id integer NOT NULL,
  order_id integer NOT NULL,
  foo text NOT NULL,
  bar text NOT NULL,
  CONSTRAINT users_pk PRIMARY KEY (user_id),
  CONSTRAINT order_fk FOREIGN KEY (order_id)
      REFERENCES orders (order_id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE NO ACTION
);

INSERT INTO orders VALUES (100000, 'a');
INSERT INTO orders VALUES (100001, 'b');
INSERT INTO orders VALUES (100002, 'c');

INSERT INTO order_lines VALUES (1, 100000, 'foo');
INSERT INTO order_lines VALUES (2, 100000, 'bar');
INSERT INTO order_lines VALUES (3, 100000, 'baz');

INSERT INTO order_lines VALUES (4, 100001, 'foo');
INSERT INTO order_lines VALUES (5, 100001, 'bar');
INSERT INTO order_lines VALUES (6, 100001, 'baz');

INSERT INTO order_lines VALUES (7, 100002, 'foo');
INSERT INTO order_lines VALUES (8, 100002, 'bar');
INSERT INTO order_lines VALUES (9, 100002, 'baz');

INSERT INTO users VALUES (1, 100000, 'x', 'y');
INSERT INTO users VALUES (2, 100000, 'x', 'y');
INSERT INTO users VALUES (3, 100001, 'x', 'y');
INSERT INTO users VALUES (4, 100001, 'x', 'y');
INSERT INTO users VALUES (5, 100002, 'x', 'y');
INSERT INTO users VALUES (6, 100002, 'x', 'y');

-- delete all order lines with value 'foo' and matching order_id
DELETE
FROM    order_lines
WHERE   value = 'foo'
AND     order_id IN (
    SELECT  order_id
    FROM    users
    WHERE   user_id = 1
);

-- worked - two rows remain
SELECT     *
FROM     order_lines
WHERE     value = 'foo';

-- rename column in users table
ALTER TABLE users RENAME COLUMN order_id TO order_id_bak;
-- Or...
-- ALTER TABLE users DROP COLUMN order_id;

-- select on value of 'bar' and orders for user 1
-- FAIL - returns all three order lines of 'bar'
SELECT     *
FROM    order_lines
WHERE   value = 'bar'
AND     order_id IN (
    -- THIS SHOULD FAIL!! THIS COLUMN NAME DOES NOT EXIST ANYMORE
    SELECT  order_id
    FROM    users
    WHERE   user_id = 1
);

-- same delete as above, except value of 'bar'
DELETE
FROM    order_lines
WHERE   value = 'bar'
AND     order_id IN (
    -- THIS SHOULD FAIL!! THIS COLUMN NAME DOES NOT EXIST ANYMORE
    SELECT  order_id
    FROM    users
    WHERE   user_id = 1
);

-- Words cannot describe. ALL ORDER LINES FOR 'bar' HAVE BEEN DELETED FOR
ALL USERS!!
SELECT     *
FROM     order_lines
WHERE     value = 'bar';

---------------------------------------
The results. I would expect the second select to only return order_lines_id
of 2, and the third select to return order lines 5 and 8.

 PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real
(Debian 4.4.5-8) 4.4.5, 64-bit
(1 row)
 order_lines_id | order_id | value
----------------+----------+-------
              4 |   100001 | foo
              7 |   100002 | foo
(2 rows)
 order_lines_id | order_id | value
----------------+----------+-------
              2 |   100000 | bar
              5 |   100001 | bar
              8 |   100002 | bar
(3 rows)
 order_lines_id | order_id | value
----------------+----------+-------
(0 rows)
collin.peters@gmail.com writes:
> Basically if you rename a column, and then use the original name in a
> subquery, that sub-query does not complain that the column no longer exists,
> and seems to complete ok.

This is not a bug, it's just a SQL-standard outer query reference.
You dropped order_id from the "users" table, but there's still a
column by that name in "order_lines", so what you've got in

> SELECT     *
> FROM    order_lines
> WHERE   value = 'bar'
> AND     order_id IN (
>     -- THIS SHOULD FAIL!! THIS COLUMN NAME DOES NOT EXIST ANYMORE
>     SELECT  order_id
>     FROM    users
>     WHERE   user_id = 1
> );

is effectively ... IN (SELECT order_lines.order_id FROM users ...
so the IN condition will succeed as long as there's at least one users
row satisfying user_id = 1.

This is a widely known SQL gotcha, which unfortunately we can't do
anything about without rejecting useful and standard-compliant queries.
The usual advice for protecting yourself against this type of mistake
is to always table-qualify every column reference in a sub-SELECT.

            regards, tom lane
collin.peters@gmail.com wrote:

> Basically if you rename a column, and then use the original name in a
> subquery, that sub-query does not complain that the column no longer exists,
> and seems to complete ok. This will cause whatever SELECT or DELETE that is
> using the sub-query to continue on its merry way and cause havoc.

It's not using the old name of the column --- it's using the name of the
OTHER column, that is, the column of the same name in the other table.
In essence, your subquery always returns true.

> -- select on value of 'bar' and orders for user 1
> -- FAIL - returns all three order lines of 'bar'
> SELECT     *
> FROM    order_lines
> WHERE   value = 'bar'
> AND     order_id IN (
>     -- THIS SHOULD FAIL!! THIS COLUMN NAME DOES NOT EXIST ANYMORE
>     SELECT  order_id
>     FROM    users
>     WHERE   user_id = 1
> );
>
> -- same delete as above, except value of 'bar'
> DELETE
> FROM    order_lines
> WHERE   value = 'bar'
> AND     order_id IN (
>     -- THIS SHOULD FAIL!! THIS COLUMN NAME DOES NOT EXIST ANYMORE
>     SELECT  order_id
>     FROM    users
>     WHERE   user_id = 1
> );


--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Wow, an amazing 'feature' for sure. Is there any use case where it actually
makes sense? I'm just wondering if this is a case where it is better to
stray from the spec? Would almost prefer a 'NOTICE' if you use an
unqualified column reference in a sub-query.

In any case... fun stuff!!  I'm glad it didn't delete something important
for us :)

On Thu Dec 18 2014 at 13:54:57 Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

> collin.peters@gmail.com wrote:
>
> > Basically if you rename a column, and then use the original name in a
> > subquery, that sub-query does not complain that the column no longer
> exists,
> > and seems to complete ok. This will cause whatever SELECT or DELETE tha=
t
> is
> > using the sub-query to continue on its merry way and cause havoc.
>
> It's not using the old name of the column --- it's using the name of the
> OTHER column, that is, the column of the same name in the other table.
> In essence, your subquery always returns true.
>
> > -- select on value of 'bar' and orders for user 1
> > -- FAIL - returns all three order lines of 'bar'
> > SELECT        *
> > FROM    order_lines
> > WHERE   value =3D 'bar'
> > AND     order_id IN (
> >       -- THIS SHOULD FAIL!! THIS COLUMN NAME DOES NOT EXIST ANYMORE
> >       SELECT  order_id
> >       FROM    users
> >       WHERE   user_id =3D 1
> > );
> >
> > -- same delete as above, except value of 'bar'
> > DELETE
> > FROM    order_lines
> > WHERE   value =3D 'bar'
> > AND     order_id IN (
> >       -- THIS SHOULD FAIL!! THIS COLUMN NAME DOES NOT EXIST ANYMORE
> >       SELECT  order_id
> >       FROM    users
> >       WHERE   user_id =3D 1
> > );
>
>
> --
> =C3=81lvaro Herrera                http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
collin.peters wrote
> Wow, an amazing 'feature' for sure. Is there any use case where it
> actually
> makes sense? I'm just wondering if this is a case where it is better to
> stray from the spec? Would almost prefer a 'NOTICE' if you use an
> unqualified column reference in a sub-query.

Correlated subqueries must be able to see values in the outer relation to
function.  Those are quite useful. Since not everything needs or wants to be
table qualified it becomes difficult to selectively enforce such a
requirement.

Two possible solutions...

1. All correlated subqueries need to have external-able columns explicitly
prefixed
2. All normal subquery columns with two possible name resolutions need to be
prefixed

The first option causes your post-rename to fail but the original query
works.
The second option causes the original query to fail but the post-rename one
works.

You can require both and both queries fail.

I'm not saying I support these...though the option would be nice to prevent
just this sort of thing.

David J.



--
View this message in context:
http://postgresql.nabble.com/BUG-12276-Using-old-name-of-a-renamed-or-dropped-column-in-a-sub-query-does-not-throw-error-tp5831401p5831501.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.