Thread: Targetlist lost when CTE join

Targetlist lost when CTE join

From
Zhang Mingli
Date:
Hi,


Mini repo

create table t1(c1 int, c2 int);
CREATE TABLE
create table t2(c1 int, c2 int);
CREATE TABLE
explain with cte1 as (insert into t2 values (1, 2) returning *) select * from cte1 join t1 using(c1);
 QUERY PLAN
----------------------------------------------------------------
 Hash Join (cost=0.04..41.23 rows=11 width=12)
 Hash Cond: (t1.c1 = cte1.c1)
 CTE cte1
 -> Insert on t2 (cost=0.00..0.01 rows=1 width=8)
 -> Result (cost=0.00..0.01 rows=1 width=8)
 -> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8)
 -> Hash (cost=0.02..0.02 rows=1 width=8)
 -> CTE Scan on cte1 (cost=0.00..0.02 rows=1 width=8)
(8 rows)

with cte1 as (insert into t2 values (1, 2) returning *) select * from cte1 join t1 using(c1);
 c1 | c2 | c2
----+----+----
(0 rows)

truncate t2;
TRUNCATE TABLE
with cte1 as (insert into t2 values (1, 2) returning *) select cte1.*, t1.* from cte1 join t1 using(c1);
 c1 | c2 | c1 | c2
----+----+----+----
(0 rows)

Table t1 and  t2 both has 2 columns: c1, c2, when CTE join select *, the result target list seems to lost one’s column c1.
But it looks good when select cte1.* and t1.* explicitly .

Is it a bug?

Regards,
Zhang Mingli

Re: Targetlist lost when CTE join

From
Zhang Mingli
Date:
Hi,

Explain verbose, seems HashJoin node drop that column.


gpadmin=# explain(verbose) with cte1 as (insert into t2 values (1, 2) returning *) select * from cte1 join t1 using(c1);
 QUERY PLAN
-------------------------------------------------------------------
 Hash Join (cost=0.04..41.23 rows=11 width=12)
 Output: cte1.c1, cte1.c2, t1.c2
 Hash Cond: (t1.c1 = cte1.c1)
 CTE cte1
 -> Insert on public.t2 (cost=0.00..0.01 rows=1 width=8)
 Output: t2.c1, t2.c2
 -> Result (cost=0.00..0.01 rows=1 width=8)
 Output: 1, 2
 -> Seq Scan on public.t1 (cost=0.00..32.60 rows=2260 width=8)
 Output: t1.c1, t1.c2
 -> Hash (cost=0.02..0.02 rows=1 width=8)
 Output: cte1.c1, cte1.c2
 -> CTE Scan on cte1 (cost=0.00..0.02 rows=1 width=8)
 Output: cte1.c1, cte1.c2
(14 rows)


Regards,
Zhang Mingli

Re: Targetlist lost when CTE join

From
Julien Rouhaud
Date:
Hi,

On Wed, Jun 28, 2023 at 04:52:34PM +0800, Zhang Mingli wrote:
>
> Mini repo
>
> create table t1(c1 int, c2 int);
> CREATE TABLE
> create table t2(c1 int, c2 int);
> CREATE TABLE
> explain with cte1 as (insert into t2 values (1, 2) returning *) select * from cte1 join t1 using(c1);
>  QUERY PLAN
> ----------------------------------------------------------------
>  Hash Join (cost=0.04..41.23 rows=11 width=12)
>  Hash Cond: (t1.c1 = cte1.c1)
>  CTE cte1
>  -> Insert on t2 (cost=0.00..0.01 rows=1 width=8)
>  -> Result (cost=0.00..0.01 rows=1 width=8)
>  -> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8)
>  -> Hash (cost=0.02..0.02 rows=1 width=8)
>  -> CTE Scan on cte1 (cost=0.00..0.02 rows=1 width=8)
> (8 rows)
>
> with cte1 as (insert into t2 values (1, 2) returning *) select * from cte1 join t1 using(c1);
>  c1 | c2 | c2
> ----+----+----
> (0 rows)
>
> truncate t2;
> TRUNCATE TABLE
> with cte1 as (insert into t2 values (1, 2) returning *) select cte1.*, t1.* from cte1 join t1 using(c1);
>  c1 | c2 | c1 | c2
> ----+----+----+----
> (0 rows)
>
> Table t1 and  t2 both has 2 columns: c1, c2, when CTE join select *, the result target list seems to lost one’s
columnc1.
 
> But it looks good when select cte1.* and t1.* explicitly .
>
> Is it a bug?

This is working as intended.  When using a USING clause you "merge" both
columns so the final target list only contain one version of the merged
columns, which doesn't happen if you use e.g. ON instead.  I'm assuming that
what the SQL standard says, but I don't have a copy to confirm.



Re: Targetlist lost when CTE join

From
Zhang Mingli
Date:
Hi

Regards,
Zhang Mingli
On Jun 28, 2023, 17:17 +0800, Julien Rouhaud <rjuju123@gmail.com>, wrote:
This is working as intended. When using a USING clause you "merge" both
columns so the final target list only contain one version of the merged
columns, which doesn't happen if you use e.g. ON instead. I'm assuming that
what the SQL standard says, but I don't have a copy to confirm.

Thanks. You’r right.
 
Have a test:

gpadmin=# with cte1 as (insert into t2 values (1, 2) returning *) select * from cte1 join t1 on t1.c1 = cte1.c1;
 c1 | c2 | c1 | c2
----+----+----+----
(0 rows)

Re: Targetlist lost when CTE join

From
Julien Rouhaud
Date:
On Wed, Jun 28, 2023 at 05:17:14PM +0800, Julien Rouhaud wrote:
> >
> > Table t1 and  t2 both has 2 columns: c1, c2, when CTE join select *, the result target list seems to lost one’s
columnc1.
 
> > But it looks good when select cte1.* and t1.* explicitly .
> >
> > Is it a bug?
>
> This is working as intended.  When using a USING clause you "merge" both
> columns so the final target list only contain one version of the merged
> columns, which doesn't happen if you use e.g. ON instead.  I'm assuming that
> what the SQL standard says, but I don't have a copy to confirm.

I forgot to mention that this is actually documented:

https://www.postgresql.org/docs/current/queries-table-expressions.html

Furthermore, the output of JOIN USING suppresses redundant columns: there is no
need to print both of the matched columns, since they must have equal values.
While JOIN ON produces all columns from T1 followed by all columns from T2,
JOIN USING produces one output column for each of the listed column pairs (in
the listed order), followed by any remaining columns from T1, followed by any
remaining columns from T2.



Re: Targetlist lost when CTE join

From
Zhang Mingli
Date:
HI, 

On Jun 28, 2023, 17:26 +0800, Julien Rouhaud <rjuju123@gmail.com>, wrote:
On Wed, Jun 28, 2023 at 05:17:14PM +0800, Julien Rouhaud wrote:

Table t1 and  t2 both has 2 columns: c1, c2, when CTE join select *, the result target list seems to lost one’s column c1.
But it looks good when select cte1.* and t1.* explicitly .

Is it a bug?

This is working as intended. When using a USING clause you "merge" both
columns so the final target list only contain one version of the merged
columns, which doesn't happen if you use e.g. ON instead. I'm assuming that
what the SQL standard says, but I don't have a copy to confirm.

I forgot to mention that this is actually documented:

https://www.postgresql.org/docs/current/queries-table-expressions.html

Furthermore, the output of JOIN USING suppresses redundant columns: there is no
need to print both of the matched columns, since they must have equal values.
While JOIN ON produces all columns from T1 followed by all columns from T2,
JOIN USING produces one output column for each of the listed column pairs (in
the listed order), followed by any remaining columns from T1, followed by any
remaining columns from T2.

Thanks for your help.

Regards,
Zhang Mingli