Thread: Dangerous Naming Confusion

Dangerous Naming Confusion

From
Don Seiler
Date:
Good evening,

Please see my gist at https://gist.github.com/dtseiler/9ef0a5e2b1e0efc6a13d5661436d4056 for a complete test case.

I tested this on PG 12.6 and 13.2 and observed the same on both.

We were expecting the queries that use dts_temp to only return 3 rows. However the subquery starting at line 36 returns ALL 250,000 rows from dts_orders. Note that the "order_id" field doesn't exist in the dts_temp table, so I'm assuming PG is using the "order_id" field from the dts_orders table. If I use explicit table references like in the query at line 48, then I get the error I would expect that the "order_id" column doesn't exist in dts_temp.

When I use the actual column name "a" for dts_temp, then I get the 3 rows back as expected.

I'm wondering if this is expected behavior that PG uses the dts_orders.order_id value in the subquery "select order_id from dts_temp" when dts_temp doesn't have its own order_id column. I would have expected an error that the column doesn't exist. Seems very counter-intuitive to think PG would use a column from a different table.

This issue was discovered today when this logic was used in an UPDATE and ended up locking all rows in a 5M row table and brought many apps to a grinding halt. Thankfully it was caught and killed before it actually updated anything.

Thanks,
Don.
--
Don Seiler
www.seiler.us

Re: Dangerous Naming Confusion

From
Adrian Klaver
Date:
On 3/29/21 3:00 PM, Don Seiler wrote:
> Good evening,
> 
> Please see my gist at 
> https://gist.github.com/dtseiler/9ef0a5e2b1e0efc6a13d5661436d4056 
> <https://gist.github.com/dtseiler/9ef0a5e2b1e0efc6a13d5661436d4056> for 
> a complete test case.
> 
> I tested this on PG 12.6 and 13.2 and observed the same on both.
> 
> We were expecting the queries that use dts_temp to only return 3 rows. 
> However the subquery starting at line 36 returns ALL 250,000 rows from 
> dts_orders. Note that the "order_id" field doesn't exist in the dts_temp 
> table, so I'm assuming PG is using the "order_id" field from the 
> dts_orders table. If I use explicit table references like in the query 
> at line 48, then I get the error I would expect that the "order_id" 
> column doesn't exist in dts_temp.
> 
> When I use the actual column name "a" for dts_temp, then I get the 3 
> rows back as expected.
> 
> I'm wondering if this is expected behavior that PG uses the 
> dts_orders.order_id value in the subquery "select order_id from 
> dts_temp" when dts_temp doesn't have its own order_id column. I would 
> have expected an error that the column doesn't exist. Seems very 
> counter-intuitive to think PG would use a column from a different table.

See:

https://www.postgresql.org/message-id/Pine.LNX.4.56.0308011345320.881@krusty.credativ.de

> 
> This issue was discovered today when this logic was used in an UPDATE 
> and ended up locking all rows in a 5M row table and brought many apps to 
> a grinding halt. Thankfully it was caught and killed before it actually 
> updated anything.
> 
> Thanks,
> Don.
> -- 
> Don Seiler
> www.seiler.us <http://www.seiler.us>


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Dangerous Naming Confusion

From
"David G. Johnston"
Date:
On Mon, Mar 29, 2021 at 3:20 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 3/29/21 3:00 PM, Don Seiler wrote:
>
> I'm wondering if this is expected behavior that PG uses the
> dts_orders.order_id value in the subquery "select order_id from
> dts_temp" when dts_temp doesn't have its own order_id column. I would
> have expected an error that the column doesn't exist. Seems very
> counter-intuitive to think PG would use a column from a different table.

See:

https://www.postgresql.org/message-id/Pine.LNX.4.56.0308011345320.881@krusty.credativ.de


There is also an FAQ entry:


David J.

Re: Dangerous Naming Confusion

From
Don Seiler
Date:
On Mon, Mar 29, 2021 at 5:22 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Mon, Mar 29, 2021 at 3:20 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 3/29/21 3:00 PM, Don Seiler wrote:
>
> I'm wondering if this is expected behavior that PG uses the
> dts_orders.order_id value in the subquery "select order_id from
> dts_temp" when dts_temp doesn't have its own order_id column. I would
> have expected an error that the column doesn't exist. Seems very
> counter-intuitive to think PG would use a column from a different table.

See:

https://www.postgresql.org/message-id/Pine.LNX.4.56.0308011345320.881@krusty.credativ.de


There is also an FAQ entry:


David J.


This is good to know. I figured it might "working as expected". Still seems annoying/counter-intuitive to me but at least we know to look for it.

Thanks,
Don.

--
Don Seiler
www.seiler.us