Re: Making Vars outer-join aware - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Making Vars outer-join aware |
Date | |
Msg-id | 3839865.1661035931@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Making Vars outer-join aware (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Making Vars outer-join aware
|
List | pgsql-hackers |
Progress report on this ... I've been trying to remove some of the cruftier aspects of EquivalenceClasses (which really is the main point of this whole effort), and gotten repeatedly blocked by the fact that the semantics are still a bit crufty thanks to the hacking associated with outer join identity 3. I think I see a path forward though. To recap, the thing about identity 3 is that it says you can get equivalent results from (A leftjoin B on (Pab)) leftjoin C on (Pb*c) A leftjoin (B leftjoin C on (Pbc)) on (Pab) if Pbc is strict for B. Unlike what it says in optimizer/README, I've written the first form as "Pb*c" to indicate that any B Vars appearing in that clause will be marked as possibly nulled by the A/B join. This makes the problem apparent: we cannot use the same representation of Pbc for both join orders, because in the second variant B's Vars are not nulled by anything. We've been trying to get away with writing Pbc just one way, and that leads directly to the semantic squishiness I've been fighting. What I'm thinking we should do about this, once we detect that this identity is applicable, is to generate *both* forms of Pbc, either adding or removing the varnullingrels bits depending on which form we got from the parser. Then, when we come to forming join paths, use the appropriate variant depending on which join order we're considering. build_join_rel() already has the concept that the join restriction list varies depending on which input relations we're trying to join, so this doesn't require any fundamental restructuring -- only a way to identify which RestrictInfos to use or ignore for a particular join. That will probably require some new field in RestrictInfo, but I'm not fussed about that because there are other fields we'll be able to remove as this work progresses. Similarly, generate_join_implied_equalities() will need to generate EquivalenceClass-derived join clauses with or without varnullingrels marks, as appropriate. I'm not quite sure how to do that, but it feels like just a small matter of programming, not a fundamental problem with the model which is where things are right now. We'll only need this for ECs that include source clauses coming from a movable outer join clause (i.e., Pbc in identity 3). An interesting point is that I think we want to force movable outer joins into the second format for the purpose of generating ECs, that is we want to use Pbc not Pb*c as the EC source form. The reason for that is to allow generation of relation-scan-level clauses from an EC, particularly an EC that includes a constant. As an example, given A leftjoin (B leftjoin C on (B.b = C.c)) on (A.a = B.b) where A.a = constant we can decide unconditionally that A.a, B.b, C.c, and the constant all belong to the same equivalence class, and thereby generate relation scan restrictions A.a = constant, B.b = constant, and C.c = constant. If we start with the other join order, which will include "B.b* = C.c" (ie Pb*c) then we'd have two separate ECs: {A.a, B.b, constant} and {B.b*, C.c}. So we'll fail to produce any scan restriction for C, or at least we can't do so in any principled way. Furthermore, if the joins are done in the second order then we don't need any additional join clauses -- both joins can act like "LEFT JOIN ON TRUE". (Right now, we'll emit redundant B.b = C.c and A.a = B.b join clauses in addition to the scan-level clauses, which is inefficient.) However, if we make use of identity 3 to do the joins in the other order, then we do need an extra join clause, like (A leftjoin B on (true)) leftjoin C on (B.b* = C.c) (or maybe we could just emit "B.b* IS NOT NULL" for Pb*c?) Without any Pb*c join constraint we get wrong answers because nulling of B fails to propagate to C. So while there are lots of details to work out, it feels like this line of thought can lead to something where setrefs.c doesn't have to ignore varnullingrels mismatches (yay) and there is no squishiness in EquivalenceClass semantics. regards, tom lane
pgsql-hackers by date: