Thread: [BUGS] BUG #14573: lateral joins, ambuiguity
The following bug has been logged on the website: Bug reference: 14573 Logged by: Denise Wiedl Email address: dlw405@gmail.com PostgreSQL version: 9.5.3 Operating system: osx 10.11.5 Description: PostgreSQL 9.5.3 on x86_64-apple-darwin15.4.0, compiled by Apple LLVM version 7.3.0 (clang-703.0.31), 64-bit The LATERAL JOIN has access to all previous columns in the join, but, it doesn't give an error when there are two columns of the same name. Instead, it silently selects the first column. Example: The related_to_everyone table has FK to both the aaardvark table, and the banana table. In turn, each aardvark entry and each banana entry has a FK to it's owner. I'm trying to figure out for a given row in related_to_everyone, what is the aardvark_owner, and the banana_owner. In a dynamically generated SQL query, we had created 2 columns named "owner". The subselect statement within the 2nd lateral join, ends up grabbing the 1st owner column. ``` SELECT (related_to_everybody_aardy.owner).last_name, (related_to_everybody_banana.owner).last_name FROM related_to_everybody "0" -- get ardy owner LEFT JOIN LATERAL ( SELECT "1".*, owner FROM aaardvark "1" LEFT JOIN LATERAL ( SELECT "2".* FROM users "2" WHERE "1".owner_id = "2".id ) owner ON true WHERE "0".aardy = "1".id ) related_to_everybody_aardy ON true -- get banana owner LEFT JOIN LATERAL ( SELECT "1".*, owner FROM banana "1" LEFT JOIN LATERAL ( SELECT "2".* FROM users "2" WHERE "1".owner_id = "2".id ) owner ON true WHERE "0".banana = "1".id ) related_to_everybody_banana ON true WHERE <select 1 row in related_to_everybody> ; ``` ^^ The `SELECT owner` in the 2nd lateral join grabs the value from the 1st owner column. Such that (related_to_everybody_banana.owner).last_name now refers to the aardvark_owner. We could solve this by aliasing each column within the subquery: ``` LEFT JOIN LATERAL ( SELECT "1".*, banana_owner as owner FROM banana "1" LEFT JOIN LATERAL ( SELECT "2".* FROM users "2" WHERE "1".owner_id = "2".id ) banana_owner ON true WHERE "0".banana = "1".id ) related_to_everybody_banana ON true ``` Then the (related_to_everybody_banana.owner).last_name will correctly refer to the banana owner. We are confused on why there was not an ambiguity error thrown on the property 'owner' during the 2nd lateral join's SELECT statement. Should there be? -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Wednesday, March 1, 2017, <dlw405@gmail.com> wrote:
The following bug has been logged on the website:
Bug reference: 14573
Logged by: Denise Wiedl
Email address: dlw405@gmail.com
PostgreSQL version: 9.5.3
Operating system: osx 10.11.5
Description:
PostgreSQL 9.5.3 on x86_64-apple-darwin15.4.0, compiled by Apple LLVM
version 7.3.0 (clang-703.0.31), 64-bit
The LATERAL JOIN has access to all previous columns in the join, but, it
doesn't give an error when there are two columns of the same name. Instead,
it silently selects the first column.
LEFT JOIN LATERAL (
SELECT "1".*, owner
FROM banana "1"
LEFT JOIN LATERAL (
SELECT "2".*
FROM users "2"
WHERE "1".owner_id = "2".id
) owner
IIUC the preference exhibited is an explicit column present on the left side of the join over the implicit relation named column within its own query.
Simpler self-contained example:
select *
from (values (1)) vals (v)
left join lateral (
select v
from (values (2)) v (val)
) src on (true)
Returns (1,1) instead of the desired (1,(2))
Beyond my pay grade for diagnostics. I don't recall this being documented and I haven't looked for it yet.
If I come up with a non-lateral involved example before this is answered I'll come back and post it.
David J.
On Wednesday, March 1, 2017, <dlw405@gmail.com> wrote:The following bug has been logged on the website:
Bug reference: 14573
Logged by: Denise Wiedl
Email address: dlw405@gmail.com
PostgreSQL version: 9.5.3
Operating system: osx 10.11.5
Description:
PostgreSQL 9.5.3 on x86_64-apple-darwin15.4.0, compiled by Apple LLVM
version 7.3.0 (clang-703.0.31), 64-bit
The LATERAL JOIN has access to all previous columns in the join, but, it
doesn't give an error when there are two columns of the same name. Instead,
it silently selects the first column.
LEFT JOIN LATERAL (
SELECT "1".*, owner
FROM banana "1"
LEFT JOIN LATERAL (
SELECT "2".*
FROM users "2"
WHERE "1".owner_id = "2".id
) ownerIIUC the preference exhibited is an explicit column present on the left side of the join over the implicit relation named column within its own query.Simpler self-contained example:select *from (values (1)) vals (v)left join lateral (select vfrom (values (2)) v (val)) src on (true)Returns (1,1) instead of the desired (1,(2))Beyond my pay grade for diagnostics. I don't recall this being documented and I haven't looked for it yet.If I come up with a non-lateral involved example before this is answered I'll come back and post it.
I'd say its working as designed (or, at least, its not unique to LATERAL) - though no joy on finding where its end-user documented.
select v --ambigious
from (values (1)) vals (v)
cross join (
select valt as v
from (values (2)) valt (val)
) v;
select v --picks the column 1
from (values (1)) vals (v)
cross join (
select valt
from (values (2)) valt (val)
) v;
David J.
Also, I would note that the preference of inner vs. outer for lateral joins are the opposite for tables vs columns. Rather confusing.
On Wed, Mar 1, 2017 at 7:47 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wednesday, March 1, 2017, <dlw405@gmail.com> wrote:The following bug has been logged on the website:
Bug reference: 14573
Logged by: Denise Wiedl
Email address: dlw405@gmail.com
PostgreSQL version: 9.5.3
Operating system: osx 10.11.5
Description:
PostgreSQL 9.5.3 on x86_64-apple-darwin15.4.0, compiled by Apple LLVM
version 7.3.0 (clang-703.0.31), 64-bit
The LATERAL JOIN has access to all previous columns in the join, but, it
doesn't give an error when there are two columns of the same name. Instead,
it silently selects the first column.
LEFT JOIN LATERAL (
SELECT "1".*, owner
FROM banana "1"
LEFT JOIN LATERAL (
SELECT "2".*
FROM users "2"
WHERE "1".owner_id = "2".id
) ownerIIUC the preference exhibited is an explicit column present on the left side of the join over the implicit relation named column within its own query.Simpler self-contained example:select *from (values (1)) vals (v)left join lateral (select vfrom (values (2)) v (val)) src on (true)Returns (1,1) instead of the desired (1,(2))Beyond my pay grade for diagnostics. I don't recall this being documented and I haven't looked for it yet.If I come up with a non-lateral involved example before this is answered I'll come back and post it.I'd say its working as designed (or, at least, its not unique to LATERAL) - though no joy on finding where its end-user documented.select v --ambigiousfrom (values (1)) vals (v)cross join (select valt as vfrom (values (2)) valt (val)) v;select v --picks the column 1from (values (1)) vals (v)cross join (select valtfrom (values (2)) valt (val)) v;David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Wed, Mar 1, 2017 at 8:22 PM, David G. Johnston <david.g.johnston@gmail.com> wrote: >> On Wednesday, March 1, 2017, <dlw405@gmail.com> wrote: >>> The LATERAL JOIN has access to all previous columns in the join, but, it >>> doesn't give an error when there are two columns of the same name. >>> Instead, it silently selects the first column. The above statement is demonstrably false, for example regression=# create table t1 (f1 int, f2 int); CREATE TABLE regression=# select * from t1 a cross join t1 b cross join lateral (select f1) ss; ERROR: column reference "f1" is ambiguous LINE 1: ...from t1 a cross join t1 b cross join lateral (select f1) ss; ^ David has the correct analysis: >> IIUC the preference exhibited is an explicit column present on the left >> side of the join over the implicit relation named column within its own >> query. An unqualified name is first sought as a column reference, and only if that fails altogether do we check whether it could be interpreted as a whole-row reference to some table. > I'd say its working as designed (or, at least, its not unique to LATERAL) > - though no joy on finding where its end-user documented. It's mentioned here: https://www.postgresql.org/docs/9.5/static/rowtypes.html#ROWTYPES-USAGE Note however that simple names are matched to column names before table names, so this example works only because there is no column named c in the query's tables. and a bit further down Even though .* does nothing in such cases, using it is good style, since it makes clear that a composite value is intended. In particular, the parser will consider c in c.* to refer to a table name or alias, not to a column name, so that there is no ambiguity; whereas without .*, it is not clear whether c means a table name or a column name, and in fact the column-name interpretation will be preferred if there is a column named c. (Admittedly, that whole section is of pretty recent vintage; but the behavior it describes is old.) >>> We are confused on why there was not an ambiguity error thrown on the >>> property 'owner' during the 2nd lateral join's SELECT statement. Should >>> there be? We can't do that because interpreting "foo" as a table reference is not per SQL standard. If there's a single possible interpretation as a column, whether it be plain or LATERAL or outer-query, we have to resolve it that way without complaint, or we will fail to accept standard-compliant queries. The whole business of allowing a table name without ".*" decoration is a PostQUEL-ism that we inherited from Berkeley and never removed; but it's nonstandard and somewhat deprecated because of the ambiguity. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Thank you for the clarification.
Denise
On Thu, Mar 2, 2017 at 11:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Wed, Mar 1, 2017 at 8:22 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
>> On Wednesday, March 1, 2017, <dlw405@gmail.com> wrote:
>>> The LATERAL JOIN has access to all previous columns in the join, but, it
>>> doesn't give an error when there are two columns of the same name.
>>> Instead, it silently selects the first column.
The above statement is demonstrably false, for example
regression=# create table t1 (f1 int, f2 int);
CREATE TABLE
regression=# select * from t1 a cross join t1 b cross join lateral (select f1) ss;
ERROR: column reference "f1" is ambiguous
LINE 1: ...from t1 a cross join t1 b cross join lateral (select f1) ss;
^
David has the correct analysis:
>> IIUC the preference exhibited is an explicit column present on the left
>> side of the join over the implicit relation named column within its own
>> query.
An unqualified name is first sought as a column reference, and only if
that fails altogether do we check whether it could be interpreted as a
whole-row reference to some table.
> I'd say its working as designed (or, at least, its not unique to LATERAL)
> - though no joy on finding where its end-user documented.
It's mentioned here:
https://www.postgresql.org/docs/9.5/static/rowtypes.html# ROWTYPES-USAGE
Note however that simple names are matched to column names before
table names, so this example works only because there is no column
named c in the query's tables.
and a bit further down
Even though .* does nothing in such cases, using it is good style,
since it makes clear that a composite value is intended. In
particular, the parser will consider c in c.* to refer to a table
name or alias, not to a column name, so that there is no
ambiguity; whereas without .*, it is not clear whether c means a
table name or a column name, and in fact the column-name
interpretation will be preferred if there is a column named c.
(Admittedly, that whole section is of pretty recent vintage; but the
behavior it describes is old.)
>>> We are confused on why there was not an ambiguity error thrown on the
>>> property 'owner' during the 2nd lateral join's SELECT statement. Should
>>> there be?
We can't do that because interpreting "foo" as a table reference is not
per SQL standard. If there's a single possible interpretation as a
column, whether it be plain or LATERAL or outer-query, we have to
resolve it that way without complaint, or we will fail to accept
standard-compliant queries.
The whole business of allowing a table name without ".*" decoration is
a PostQUEL-ism that we inherited from Berkeley and never removed; but
it's nonstandard and somewhat deprecated because of the ambiguity.
regards, tom lane