Thread: Is this a bug or a feature? Column visibility in subquery from outer query

postgres=# create table public.ps_test_x (x1 oid, x2 oid, x3 oid);
CREATE TABLE
postgres=# create table public.ps_test_y (y1 oid, y2 oid, y3 oid);
CREATE TABLE
postgres=# explain select * from public.ps_test_x where x1 in (select x1 from public.ps_test_y);
                             QUERY PLAN
---------------------------------------------------------------------
 Seq Scan on ps_test_x  (cost=0.00..28462.75 rows=885 width=12)
   Filter: (subplan)
   SubPlan
     ->  Seq Scan on ps_test_y  (cost=0.00..27.70 rows=1770 width=0)
(4 rows)

postgres=# explain select * from public.ps_test_x where x1 in (select x2 from public.ps_test_y);
                             QUERY PLAN
---------------------------------------------------------------------
 Seq Scan on ps_test_x  (cost=0.00..28462.75 rows=885 width=12)
   Filter: (subplan)
   SubPlan
     ->  Seq Scan on ps_test_y  (cost=0.00..27.70 rows=1770 width=0)
(4 rows)

postgres=# explain select * from public.ps_test_x where x1 in (select x3 from public.ps_test_y);
                             QUERY PLAN
---------------------------------------------------------------------
 Seq Scan on ps_test_x  (cost=0.00..28462.75 rows=885 width=12)
   Filter: (subplan)
   SubPlan
     ->  Seq Scan on ps_test_y  (cost=0.00..27.70 rows=1770 width=0)
(4 rows)

postgres=# explain select * from public.ps_test_x where x1 in (select x4 from public.ps_test_y);
ERROR:  column "x4" does not exist
LINE 1: ...elect * from public.ps_test_x where x1 in (select x4 from pu...
                                                             ^
postgres=# explain select * from public.ps_test_x where x1 in (select y1 from public.ps_test_y);
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Hash Join  (cost=36.62..88.66 rows=1770 width=12)
   Hash Cond: (ps_test_x.x1 = ps_test_y.y1)
   ->  Seq Scan on ps_test_x  (cost=0.00..27.70 rows=1770 width=12)
   ->  Hash  (cost=34.12..34.12 rows=200 width=4)
         ->  HashAggregate  (cost=32.12..34.12 rows=200 width=4)
               ->  Seq Scan on ps_test_y  (cost=0.00..27.70 rows=1770 width=4)
(6 rows)

I just want to point out that the sub-query is using a column from the outer query (eg. x1)  without an alias from the table in the outer query. This can lead to a confusion when, for example:

delete from table1 where foreign_id in (select foreign_id from table2)

-- ! table2 does not have the foreign_id column !

This would do a table scan on table1 and delete all its rows.
Why isn't it like:

delete from table1 where foreign_id in (select table1.foreign_id from table2)

where you must specify the outer query's table reference inside the subquery. I suspect the original intent was to use the outer query columns in some processing inside the subquery, which is valid. I'm just wondering why an explicit reference isn't required to distinguish the column. For convenience, or part of the SQL spec?

Cheers!


--Paolo Saul
Paolo Saul <paolo.saul@telogis.com> writes:
> I just want to point out that the sub-query is using a column from the outer
> query (eg. x1)  without an alias from the table in the outer query.

This behavior is required by the SQL standard.  It's called an outer
reference.

            regards, tom lane

Thank you for clearing that up.

Regards,
Paolo Saul




This behavior is required by the SQL standard.  It's called an outer
reference.

                       regards, tom lane