Thread: [BUGS] BUG #14573: lateral joins, ambuiguity

[BUGS] BUG #14573: lateral joins, ambuiguity

From
dlw405@gmail.com
Date:
The following bug has been logged on the website:

Bug reference:      14573
Logged by:          Denise Wiedl
Email address:      dlw405@gmail.com
PostgreSQL version: 9.5.3
Operating system:   osx 10.11.5
Description:

PostgreSQL 9.5.3 on x86_64-apple-darwin15.4.0, compiled by Apple LLVM
version 7.3.0 (clang-703.0.31), 64-bit

The LATERAL JOIN has access to all previous columns in the join, but, it
doesn't give an error when there are two columns of the same name. Instead,
it silently selects the first column.

Example:
The related_to_everyone table has FK to both the aaardvark table, and the
banana table.
In turn, each aardvark entry and each banana entry has a FK to it's owner.

I'm trying to figure out for a given row in related_to_everyone, what is the
aardvark_owner, and the banana_owner.

In a dynamically generated SQL query, we had created 2 columns named
"owner". The subselect statement within the 2nd lateral join, ends up
grabbing the 1st owner column.

```
SELECT
    (related_to_everybody_aardy.owner).last_name,
    (related_to_everybody_banana.owner).last_name
FROM related_to_everybody "0"
-- get ardy owner
LEFT JOIN LATERAL (
    SELECT "1".*, owner
    FROM aaardvark "1"
    LEFT JOIN LATERAL (
        SELECT "2".*
        FROM users "2"
        WHERE "1".owner_id = "2".id
    ) owner ON true
    WHERE "0".aardy = "1".id
) related_to_everybody_aardy ON true
-- get banana owner
LEFT JOIN LATERAL (
    SELECT "1".*, owner
    FROM banana "1"
    LEFT JOIN LATERAL (
        SELECT "2".*
        FROM users "2"
        WHERE "1".owner_id = "2".id
    ) owner ON true
    WHERE "0".banana = "1".id
) related_to_everybody_banana ON true
WHERE <select 1 row in related_to_everybody>
;
```

^^ The `SELECT owner` in the 2nd lateral join grabs the value from the 1st
owner column.
Such that (related_to_everybody_banana.owner).last_name now refers to the
aardvark_owner.

We could solve this by aliasing each column within the subquery:
```
LEFT JOIN LATERAL (
    SELECT "1".*, banana_owner as owner
    FROM banana "1"
    LEFT JOIN LATERAL (
        SELECT "2".*
        FROM users "2"
        WHERE "1".owner_id = "2".id
    ) banana_owner ON true
    WHERE "0".banana = "1".id
) related_to_everybody_banana ON true
```
Then the (related_to_everybody_banana.owner).last_name will correctly refer
to the banana owner.


We are confused on why there was not an ambiguity error thrown on the
property 'owner' during the 2nd lateral join's SELECT statement. Should
there be?


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14573: lateral joins, ambuiguity

From
"David G. Johnston"
Date:
On Wednesday, March 1, 2017, <dlw405@gmail.com> wrote:
The following bug has been logged on the website:

Bug reference:      14573
Logged by:          Denise Wiedl
Email address:      dlw405@gmail.com
PostgreSQL version: 9.5.3
Operating system:   osx 10.11.5
Description:

PostgreSQL 9.5.3 on x86_64-apple-darwin15.4.0, compiled by Apple LLVM
version 7.3.0 (clang-703.0.31), 64-bit

The LATERAL JOIN has access to all previous columns in the join, but, it
doesn't give an error when there are two columns of the same name. Instead,
it silently selects the first column.

LEFT JOIN LATERAL (
        SELECT "1".*, owner
        FROM banana "1"
        LEFT JOIN LATERAL (
                SELECT "2".*
                FROM users "2"
                WHERE "1".owner_id = "2".id
        ) owner 

IIUC the preference exhibited is an explicit column present on the left side of the join over the implicit relation named column within its own query.

​Simpler self-contained example:

select *
from  (values (1)) vals (v)
left join lateral (
select v
from (values (2)) v (val)
) src on (true)

Returns (1,1) instead of the desired (1,(2))

​Beyond my pay grade for diagnostics.  I don't recall this being documented and I haven't looked for it yet.

If I come up with a non-lateral involved example before this is answered I'll come back and post it.

David J.

Re: [BUGS] BUG #14573: lateral joins, ambuiguity

From
"David G. Johnston"
Date:
On Wed, Mar 1, 2017 at 8:22 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wednesday, March 1, 2017, <dlw405@gmail.com> wrote:
The following bug has been logged on the website:

Bug reference:      14573
Logged by:          Denise Wiedl
Email address:      dlw405@gmail.com
PostgreSQL version: 9.5.3
Operating system:   osx 10.11.5
Description:

PostgreSQL 9.5.3 on x86_64-apple-darwin15.4.0, compiled by Apple LLVM
version 7.3.0 (clang-703.0.31), 64-bit

The LATERAL JOIN has access to all previous columns in the join, but, it
doesn't give an error when there are two columns of the same name. Instead,
it silently selects the first column.

LEFT JOIN LATERAL (
        SELECT "1".*, owner
        FROM banana "1"
        LEFT JOIN LATERAL (
                SELECT "2".*
                FROM users "2"
                WHERE "1".owner_id = "2".id
        ) owner 

IIUC the preference exhibited is an explicit column present on the left side of the join over the implicit relation named column within its own query.

​Simpler self-contained example:

select *
from  (values (1)) vals (v)
left join lateral (
select v
from (values (2)) v (val)
) src on (true)

Returns (1,1) instead of the desired (1,(2))

​Beyond my pay grade for diagnostics.  I don't recall this being documented and I haven't looked for it yet.

If I come up with a non-lateral involved example before this is answered I'll come back and post it.

​I'd say its working as designed (or, at least, its not unique to LATERAL) - though no joy on finding where its end-user documented.​

​select v --ambigious
from  (values (1)) vals (v)
cross join (
select valt as v
from (values (2)) valt (val)
) v;

select v --picks the column 1
from  (values (1)) vals (v)
cross join (
select valt
from (values (2)) valt (val)
) v;

David J.

Re: [BUGS] BUG #14573: lateral joins, ambuiguity

From
Denise Wiedl
Date:
Also, I would note that the preference of inner vs. outer for lateral joins are the opposite for tables vs columns. Rather confusing.
 

On Wed, Mar 1, 2017 at 7:47 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wed, Mar 1, 2017 at 8:22 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wednesday, March 1, 2017, <dlw405@gmail.com> wrote:
The following bug has been logged on the website:

Bug reference:      14573
Logged by:          Denise Wiedl
Email address:      dlw405@gmail.com
PostgreSQL version: 9.5.3
Operating system:   osx 10.11.5
Description:

PostgreSQL 9.5.3 on x86_64-apple-darwin15.4.0, compiled by Apple LLVM
version 7.3.0 (clang-703.0.31), 64-bit

The LATERAL JOIN has access to all previous columns in the join, but, it
doesn't give an error when there are two columns of the same name. Instead,
it silently selects the first column.

LEFT JOIN LATERAL (
        SELECT "1".*, owner
        FROM banana "1"
        LEFT JOIN LATERAL (
                SELECT "2".*
                FROM users "2"
                WHERE "1".owner_id = "2".id
        ) owner 

IIUC the preference exhibited is an explicit column present on the left side of the join over the implicit relation named column within its own query.

​Simpler self-contained example:

select *
from  (values (1)) vals (v)
left join lateral (
select v
from (values (2)) v (val)
) src on (true)

Returns (1,1) instead of the desired (1,(2))

​Beyond my pay grade for diagnostics.  I don't recall this being documented and I haven't looked for it yet.

If I come up with a non-lateral involved example before this is answered I'll come back and post it.

​I'd say its working as designed (or, at least, its not unique to LATERAL) - though no joy on finding where its end-user documented.​

​select v --ambigious
from  (values (1)) vals (v)
cross join (
select valt as v
from (values (2)) valt (val)
) v;

select v --picks the column 1
from  (values (1)) vals (v)
cross join (
select valt
from (values (2)) valt (val)
) v;

David J.

Re: [BUGS] BUG #14573: lateral joins, ambuiguity

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Wed, Mar 1, 2017 at 8:22 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
>> On Wednesday, March 1, 2017, <dlw405@gmail.com> wrote:
>>> The LATERAL JOIN has access to all previous columns in the join, but, it
>>> doesn't give an error when there are two columns of the same name.
>>> Instead, it silently selects the first column.

The above statement is demonstrably false, for example

regression=# create table t1 (f1 int, f2 int);
CREATE TABLE
regression=# select * from t1 a cross join t1 b cross join lateral (select f1) ss;
ERROR:  column reference "f1" is ambiguous
LINE 1: ...from t1 a cross join t1 b cross join lateral (select f1) ss;
                                                                ^

David has the correct analysis:

>> IIUC the preference exhibited is an explicit column present on the left
>> side of the join over the implicit relation named column within its own
>> query.

An unqualified name is first sought as a column reference, and only if
that fails altogether do we check whether it could be interpreted as a
whole-row reference to some table.

> ​I'd say its working as designed (or, at least, its not unique to LATERAL)
> - though no joy on finding where its end-user documented.​

It's mentioned here:
https://www.postgresql.org/docs/9.5/static/rowtypes.html#ROWTYPES-USAGE

    Note however that simple names are matched to column names before
    table names, so this example works only because there is no column
    named c in the query's tables.

and a bit further down

    Even though .* does nothing in such cases, using it is good style,
    since it makes clear that a composite value is intended. In
    particular, the parser will consider c in c.* to refer to a table
    name or alias, not to a column name, so that there is no
    ambiguity; whereas without .*, it is not clear whether c means a
    table name or a column name, and in fact the column-name
    interpretation will be preferred if there is a column named c.

(Admittedly, that whole section is of pretty recent vintage; but the
behavior it describes is old.)

>>> We are confused on why there was not an ambiguity error thrown on the
>>> property 'owner' during the 2nd lateral join's SELECT statement. Should
>>> there be?

We can't do that because interpreting "foo" as a table reference is not
per SQL standard.  If there's a single possible interpretation as a
column, whether it be plain or LATERAL or outer-query, we have to
resolve it that way without complaint, or we will fail to accept
standard-compliant queries.

The whole business of allowing a table name without ".*" decoration is
a PostQUEL-ism that we inherited from Berkeley and never removed; but
it's nonstandard and somewhat deprecated because of the ambiguity.

            regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14573: lateral joins, ambuiguity

From
Denise Wiedl
Date:
Thank you for the clarification.

Denise


On Thu, Mar 2, 2017 at 11:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Wed, Mar 1, 2017 at 8:22 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
>> On Wednesday, March 1, 2017, <dlw405@gmail.com> wrote:
>>> The LATERAL JOIN has access to all previous columns in the join, but, it
>>> doesn't give an error when there are two columns of the same name.
>>> Instead, it silently selects the first column.

The above statement is demonstrably false, for example

regression=# create table t1 (f1 int, f2 int);
CREATE TABLE
regression=# select * from t1 a cross join t1 b cross join lateral (select f1) ss;
ERROR:  column reference "f1" is ambiguous
LINE 1: ...from t1 a cross join t1 b cross join lateral (select f1) ss;
                                                                ^

David has the correct analysis:

>> IIUC the preference exhibited is an explicit column present on the left
>> side of the join over the implicit relation named column within its own
>> query.

An unqualified name is first sought as a column reference, and only if
that fails altogether do we check whether it could be interpreted as a
whole-row reference to some table.

> ​I'd say its working as designed (or, at least, its not unique to LATERAL)
> - though no joy on finding where its end-user documented.​

It's mentioned here:
https://www.postgresql.org/docs/9.5/static/rowtypes.html#ROWTYPES-USAGE

        Note however that simple names are matched to column names before
        table names, so this example works only because there is no column
        named c in the query's tables.

and a bit further down

        Even though .* does nothing in such cases, using it is good style,
        since it makes clear that a composite value is intended. In
        particular, the parser will consider c in c.* to refer to a table
        name or alias, not to a column name, so that there is no
        ambiguity; whereas without .*, it is not clear whether c means a
        table name or a column name, and in fact the column-name
        interpretation will be preferred if there is a column named c.

(Admittedly, that whole section is of pretty recent vintage; but the
behavior it describes is old.)

>>> We are confused on why there was not an ambiguity error thrown on the
>>> property 'owner' during the 2nd lateral join's SELECT statement. Should
>>> there be?

We can't do that because interpreting "foo" as a table reference is not
per SQL standard.  If there's a single possible interpretation as a
column, whether it be plain or LATERAL or outer-query, we have to
resolve it that way without complaint, or we will fail to accept
standard-compliant queries.

The whole business of allowing a table name without ".*" decoration is
a PostQUEL-ism that we inherited from Berkeley and never removed; but
it's nonstandard and somewhat deprecated because of the ambiguity.

                        regards, tom lane