Re: A new strategy for pull-up correlated ANY_SUBLINK - Mailing list pgsql-hackers

From Andrei Lepikhov
Subject Re: A new strategy for pull-up correlated ANY_SUBLINK
Date
Msg-id 780533de-c9af-4823-94bf-dbbdb03c1bfa@gmail.com
Whole thread Raw
In response to Re: A new strategy for pull-up correlated ANY_SUBLINK  (Andrei Lepikhov <lepihov@gmail.com>)
List pgsql-hackers
On 7/1/24 16:17, Andrei Lepikhov wrote:
> On 10/12/23 14:52, Andy Fan wrote:
>> Here the sublink can't be pulled up because of its reference to
>> the  LHS of left join, the original logic is that no matter the 'b.t 
>> in ..'
>> returns the true or false,  the rows in LHS will be returned.  If we
>> pull it up to LHS, some rows in LHS will be filtered out, which
>> breaks its original semantics.
> Hi,
> I spent some time trying to understand your sentence.
> I mean the following case:
> 
> SELECT * FROM t1 LEFT JOIN t2
>    ON t2.x IN (SELECT y FROM t3 WHERE t1.x=t3.x);
> 
> I read [1,2,3], but I am still unsure why it is impossible in the case 
> of OUTER JOIN. By setting the LATERAL clause, we forbid any clauses from 
> the RTE subquery to bubble up as a top-level clause and filter tuples 
> from LHS, am I wrong? Does it need more research or you can show some 
> case to support your opinion - why this type of transformation must be 
> disallowed?
> 
> [1] https://www.postgresql.org/message-id/6531.1218473967%40sss.pgh.pa.us
> [2] 
> https://www.postgresql.org/message-id/BANLkTikGFtGnAaXVh5%3DntRdN%2B4w%2Br%3DNPuw%40mail.gmail.com
> [3] https://www.vldb.org/conf/1992/P091.PDF
> 

I delved into it a bit more. After reading [4,5] I invented query that 
is analogue of the query above, but with manually pulled-up sublink:

EXPLAIN (COSTS OFF)
SELECT * FROM t1 LEFT JOIN t2 JOIN LATERAL
(SELECT t1.x AS x1, y,x FROM t3) q1 ON (t2.x=q1.y AND q1.x1=q1.x) ON true;

And you can see the plan:

  Nested Loop Left Join
    ->  Seq Scan on t1
    ->  Hash Join
          Hash Cond: (t2.x = t3.y)
          ->  Seq Scan on t2
          ->  Hash
                ->  Seq Scan on t3
                      Filter: (t1.x = x)

Just for fun, I played with MSSQL Server and if I read its explain 
correctly, it also allows pulls-up sublink which mentions LHS:

-------------------------------------
Nested Loops(Left Outer Join, OUTER REFERENCES:(t1.x))
   Table Scan(OBJECT:(t1))
     Hash Match(Right Semi Join, HASH:(t3.y)=(t2.x),
                RESIDUAL:(t2.x=t3.y))
       Table Scan(OBJECT:(t3), WHERE:(t1.x=t3.x))
     Table Scan(OBJECT:(t2))
-------------------------------------

(I cleaned MSSQL explain a little bit for clarity).
So, may we allow references to LHS in such sublink?

[4] 
https://www.postgresql.org/message-id/flat/15523.1372190410%40sss.pgh.pa.us
[5] 
https://www.postgresql.org/message-id/20130617235236.GA1636@jeremyevans.local

-- 
regards, Andrei Lepikhov




pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: Eager aggregation, take 3
Next
From: Dilip Kumar
Date:
Subject: Re: Conflict Detection and Resolution