Re: Question about pull_up_sublinks_qual_recurse - Mailing list pgsql-hackers

From Andy Fan
Subject Re: Question about pull_up_sublinks_qual_recurse
Date
Msg-id CAKU4AWoBQRw1_m524tLDnQyi8Rj2uzdy+Dsf32mC8YgsxkqABw@mail.gmail.com
Whole thread Raw
In response to Re: Question about pull_up_sublinks_qual_recurse  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers


On Sat, Oct 15, 2022 at 3:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andy Fan <zhihui.fan1213@gmail.com> writes:
> After some more self review,  I find my proposal has the following side
> effects.

Yeah, I do not think this works at all.  .... 
The discussion of outer join
reordering in optimizer/README says that that doesn't work, and while
I'm too lazy to construct an example right now, I believe it's true.

I came to this topic again recently and have finally figured out the
reason.  It looks to me that semi join is slightly different with outer
join in this case.

The following test case can show why we have to
pull_up_sublinks_qual_recurse to either LHS or RHS rather than the
JoinExpr.

create table t1(a int, b int, c int);
create table t2(a int, b int, c int);
create table t3(a int, b int, c int);

insert into t1 select 1, 1, 2;
insert into t2 select 1, 2, 1;
insert into t2 select 1, 1, 2;
insert into t3 select 1, 1, 2;

select * from t1
where exists (select 1 from t2
-- below references to t1 and t2 at the same time
where exists (select 1 from t3
   where t1.c = t2.c and t2.b = t3.b)  
and t1.a = t2.a);

which can be transformed to

SELECT * FROM t1 SEMI JOIN t2
ON t1.a = t2.a
AND exists (select 1 from t3
where t1.c = t2.c
    and t2.b = t3.b)

Here the semantics of the query is return the rows in T1 iff there is a
row in t2 matches the whole clause (t1.a = t2.a AND exists..);

But If we transform it to

SELECT * FROM (t1 SEMI JOIN t2
ON t1.a = t2.a) SEMI JOIN t3
on t1.c = t2.c and t2.b = t3.b;

The scan on T2 would stop if ONLY (t1.a = t2.a) matches and the following
rows will be ignored. However the matched rows may doesn't match the
exists clause! So in the above example, the correct result set will be 1 
row. If we pull up the sublink above the JoinExpr, no row would be found.

The attached is just a comment and a test case to help understand why we
have to do things like this.
 
--
Best Regards
Andy Fan
Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Save a few bytes in pg_attribute
Next
From: Richard Guo
Date:
Subject: Re: Fix typo plgsql to plpgsql.