Hi Tom:
Thanks for your reply! I have self reviewed the below message at 3 different
time periods to prevent from too inaccurate replies. It may be more detailed
than it really needed, but it probably can show where I am lost.
If the pulled-up join doesn't go into the nullable side of the upper
join then you've changed semantics. In this case, it'd amount to
reassociating a semijoin that was within the righthand side of another
semijoin to above that other semijoin.
I understand your reply as:
select * from t1 left join t2 on exists (select 1 from t3 where t3.a = t2.a);
= select * from t1 left join (t2 semi join t3 on t3.a = t2.a) on true; -- go to nullable side
!= select * from (t1 left join t2 on true) semi join t3 on (t3.a = t2.a); -- go to above the JoinExpr
I CAN follow the above. And for this case it is controlled by below code:
pull_up_sublinks_qual_recurse
switch (j->jointype)
{
case JOIN_INNER:
...
case JOIN_LEFT:
j->quals = pull_up_sublinks_qual_recurse(root, j->quals,
&j->rarg,
rightrelids,
NULL, NULL);
break;
...
}
and I didn't change this. My question is could we assume
A semijoin B ON EXISTS (SELECT 1 FROM C on (Pbc))
= (A semijoin (B semijoin C on (Pbc))) on TRUE. (current master did)
= (A semijoin B ON true) semijoin C on (Pbc) (my current thinking)
Note that there is no 'left outer join' at this place. Since there are too
many places called pull_up_sublinks_qual_recurse, to make things
less confused, I prepared a patch for this one line change to show where
exactly I changed (see patch 2); I think this is the first place I lost.
The discussion of outer join
reordering in optimizer/README says that that doesn't work,
I think you are talking about the graph "Valid OUTER JOIN Optimizations".
I can follow until below.
"
SEMI joins work a little bit differently. A semijoin can be reassociated
into or out of the lefthand side of another semijoin, left join, or
antijoin, but not into or out of the righthand side. ..
"
I am unclear why
(A semijoin B on (Pab)) semijoin C on (Pbc)
!= A semijoin (B semijoin C on (Pbc)) on (Pab);
Seems both return rows from A which match both semijoin (Pab) and
(Pbc). or I misunderstand the above words in the first place?
At last, when I checked optimizer/README, it looks like we used
a 'nullable side' while it should be 'nonnullable side'? see patch 1
for details.
--