Dangerous Naming Confusion - Mailing list pgsql-general

From Don Seiler
Subject Dangerous Naming Confusion
Date
Msg-id CAHJZqBBxQhaSFuHKhR-sp95ibxibad+oaBoJv=1wVO-1h366eg@mail.gmail.com
Whole thread Raw
Responses Re: Dangerous Naming Confusion  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general
Good evening,

Please see my gist at https://gist.github.com/dtseiler/9ef0a5e2b1e0efc6a13d5661436d4056 for a complete test case.

I tested this on PG 12.6 and 13.2 and observed the same on both.

We were expecting the queries that use dts_temp to only return 3 rows. However the subquery starting at line 36 returns ALL 250,000 rows from dts_orders. Note that the "order_id" field doesn't exist in the dts_temp table, so I'm assuming PG is using the "order_id" field from the dts_orders table. If I use explicit table references like in the query at line 48, then I get the error I would expect that the "order_id" column doesn't exist in dts_temp.

When I use the actual column name "a" for dts_temp, then I get the 3 rows back as expected.

I'm wondering if this is expected behavior that PG uses the dts_orders.order_id value in the subquery "select order_id from dts_temp" when dts_temp doesn't have its own order_id column. I would have expected an error that the column doesn't exist. Seems very counter-intuitive to think PG would use a column from a different table.

This issue was discovered today when this logic was used in an UPDATE and ended up locking all rows in a 5M row table and brought many apps to a grinding halt. Thankfully it was caught and killed before it actually updated anything.

Thanks,
Don.
--
Don Seiler
www.seiler.us

pgsql-general by date:

Previous
From: Bryn Llewellyn
Date:
Subject: Re: = t1 - t0 but t0 + i <> t1 when t1 and t2 timestamptz values and i is an interval value
Next
From: Adrian Klaver
Date:
Subject: Re: Dangerous Naming Confusion