Re: Question about pull_up_sublinks_qual_recurse - Mailing list pgsql-hackers

From Andy Fan
Subject Re: Question about pull_up_sublinks_qual_recurse
Date
Msg-id CAKU4AWp2TtVh+CnUFTMFQ-zHvttouXGfcvJovaKH4Q6dwjmPKA@mail.gmail.com
Whole thread Raw
In response to Re: Question about pull_up_sublinks_qual_recurse  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi Tom:

Thanks for your reply! I have self reviewed the below message at 3 different
time periods to prevent from too inaccurate replies. It may be more detailed
than it really needed, but it probably can show where I am lost.

On Sat, Oct 15, 2022 at 3:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

If the pulled-up join doesn't go into the nullable side of the upper
join then you've changed semantics.  In this case, it'd amount to
reassociating a semijoin that was within the righthand side of another
semijoin to above that other semijoin. 

I understand your reply as:

select * from t1 left join t2 on exists (select 1 from t3 where t3.a = t2.a);
= select * from t1 left join (t2 semi join t3 on t3.a = t2.a) on true;  -- go to nullable side
!=  select * from (t1 left join t2 on true) semi join t3 on (t3.a = t2.a);  -- go to above the JoinExpr

I CAN follow the above.  And for this case it is controlled by below code:

pull_up_sublinks_qual_recurse

switch (j->jointype)
{
          case JOIN_INNER:
                       ...
          case JOIN_LEFT:
                j->quals = pull_up_sublinks_qual_recurse(root, j->quals,
                                                                                           &j->rarg,
                                                                                           rightrelids,
                                                                                           NULL, NULL);
              break;
            ...
}

and I didn't change this.   My question is could we assume

A semijoin B ON EXISTS (SELECT 1 FROM C on (Pbc))
=  (A semijoin (B semijoin C on (Pbc))) on TRUE.  (current master did)
=  (A semijoin B ON true) semijoin C on (Pbc)        (my current thinking)

Note that there is no 'left outer join' at this place.   Since there are too
many places called pull_up_sublinks_qual_recurse, to make things 
less confused, I prepared a patch for this one line change to show where
exactly I changed (see patch 2);  I think this is the first place I lost.

The discussion of outer join
reordering in optimizer/README says that that doesn't work,

I think you are talking about the graph "Valid OUTER JOIN Optimizations".
I can follow until below.

"
SEMI joins work a little bit differently.  A semijoin can be reassociated
into or out of the lefthand side of another semijoin, left join, or
antijoin, but not into or out of the righthand side.  ..
"
I am unclear why 
 (A semijoin B on (Pab)) semijoin C on (Pbc)
!= A semijoin (B semijoin C on (Pbc)) on (Pab); 

Seems both return rows from A which match both semijoin (Pab) and
(Pbc).  or I misunderstand the above words in the first place?

At last, when I checked optimizer/README, it looks like we used
a 'nullable side'  while it should be 'nonnullable side'? see patch 1
for details.

--
Best Regards
Andy Fan
Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: thinko in basic_archive.c
Next
From: Matthias van de Meent
Date:
Subject: Re: PATCH: Using BRIN indexes for sorted output