Thread: Improving planning of outer joins
I've spent some time looking into how we can improve our planning of outer joins. The current planner code slavishly follows the syntactic join order, which can lead to quite bad plans. The reason it does this is that in some cases altering the join order of outer joins can change the results. However, there are many cases where the results would not be changed, and we really need to start taking advantage of those cases. After some review of the literature, I've found out that these identities always hold:(A leftjoin B on (Pab)) innerjoin C on (Pac)= (A innerjoin C on (Pac)) leftjoin B on (Pab) (where Pac is a predicate not referencing B, else of course the transformation is nonsensical)(A leftjoin B on (Pab)) leftjoin C on (Pac)= (A leftjoin C on (Pac)) leftjoin B on (Pab) Also, (A leftjoin B on (Pab)) leftjoin C on (Pbc)= A leftjoin (B leftjoin C on (Pbc)) on (Pab) if predicate Pbc must fail for all-null B rows. The case that does *not* work is moving an innerjoin into or out of the nullable side of an outer join:A leftjoin (B join C on (Pbc)) on (Pab)!= (A leftjoin B on (Pab)) join C on (Pbc) There is some stuff in the literature about how to make transformations of the last kind, but it requires additional executor smarts to do strange sorts of "generalized outer join" operations. I'm disinclined to tackle that for now, partly because it seems like a lot of work for uncertain payback (the first three cases cover most of the practical examples I've looked at), and partly because I'm not too sure about the patent situation for these things. The three identities shown above have been known since the late '80s and should be safe enough to work with. What I'm thinking of doing is changing the planner so that a left or right join node is broken apart into two separate relations (effectively FROM-list items) that participate in the normal join-order search process, subject to these limitations: * A regular join within the nullable side (the right side of a left join, etc) remains indivisible and must be planned separately. In other cases we recursively break down sub-joins into separate relations. * The nullable side is marked to show that its first join must be to a set of relations including at least those relationson the outer side that are referenced in the outer-join condition. (This compares to the current state of affairs,which is that its first join is always to exactly the relation(s) on the outer side.) If the outer-join conditionisn't strict for these outer-side relations, however, we have to punt and mark the nullable side as requiring allof the outer-side relations for its first join (this covers the case where the third identity above doesn't hold). Full joins are not covered by this and will continue to be planned as now. I'm not sure whether we'd need any additional planner knobs to control this. I think that the existing join_collapse_limit GUC variable should continue to exist, but its effect on left/right joins will be the same as for inner joins. If anyone wants to force join order for outer joins more than for inner joins, we'd need some other control setting, but I don't currently see why that would be very useful. Does this seem like a reasonable agenda, or am I thinking too small? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > There is some stuff in the literature about how to make transformations > of the last kind, but it requires additional executor smarts to do strange > sorts of "generalized outer join" operations. Would these "generalized outer join" operations be general enough to handle IN semantics? Or other subqueries? -- greg
Greg Stark <gsstark@mit.edu> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> There is some stuff in the literature about how to make transformations >> of the last kind, but it requires additional executor smarts to do strange >> sorts of "generalized outer join" operations. > Would these "generalized outer join" operations be general enough to handle IN > semantics? Or other subqueries? No, AFAICT it's just a weird way of defining a join operation. I did find some papers that talked about ways to push joins up and down past aggregations and GROUP BY, but that's a problem for another day. regards, tom lane
On Thu, Dec 15, 2005 at 09:25:42AM -0500, Tom Lane wrote: > Greg Stark <gsstark@mit.edu> writes: > > Tom Lane <tgl@sss.pgh.pa.us> writes: > >> There is some stuff in the literature about how to make transformations > >> of the last kind, but it requires additional executor smarts to do strange > >> sorts of "generalized outer join" operations. > > > Would these "generalized outer join" operations be general enough to handle IN > > semantics? Or other subqueries? > > No, AFAICT it's just a weird way of defining a join operation. > > I did find some papers that talked about ways to push joins up and down > past aggregations and GROUP BY, but that's a problem for another day. Might be worth adding a TODO for that and including links to the papers. There's enough people that seem to drop in with PhD thesis and what-not pulled from the TODO that someone could end up doing this work for us. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
> I'm not sure whether we'd need any additional planner knobs to control > this. I think that the existing join_collapse_limit GUC variable should > continue to exist, but its effect on left/right joins will be the same as > for inner joins. If anyone wants to force join order for outer joins more > than for inner joins, we'd need some other control setting, but I don't > currently see why that would be very useful. > > Does this seem like a reasonable agenda, or am I thinking too small? I think it's fantastic - it's a big complaint about planning we see often in the IRC channel. Chris
> I'm not sure whether we'd need any additional planner knobs to control > this. I think that the existing join_collapse_limit GUC variable should > continue to exist, but its effect on left/right joins will be the same as > for inner joins. If anyone wants to force join order for outer joins more > than for inner joins, we'd need some other control setting, but I don't > currently see why that would be very useful. > > Does this seem like a reasonable agenda, or am I thinking too small? Oh, and if you wanted to go bigger - the next planning issue people seem to have is with our union planning... Chris
On Wed, 2005-12-14 at 21:27 -0500, Tom Lane wrote: > I've spent some time looking into how we can improve our planning of outer > joins. Sounds good. > I'm not sure whether we'd need any additional planner knobs to control > this. I think that the existing join_collapse_limit GUC variable should > continue to exist, but its effect on left/right joins will be the same as > for inner joins. If anyone wants to force join order for outer joins more > than for inner joins, we'd need some other control setting, but I don't > currently see why that would be very useful. Agreed. No additional GUCs required. > Does this seem like a reasonable agenda Yup - tis good. Best Regards, Simon Riggs
On Thu, 2005-12-15 at 09:25 -0500, Tom Lane wrote: > I did find some papers that talked about ways to push joins up and down > past aggregations and GROUP BY, but that's a problem for another day. Yes, they seem like a good thing to get round to... the papers seem to present them as fairly clear transformations. I'd be interested if you get the chance to look at that. Best Regards, Simon Riggs
Tom Lane wrote: > I've spent some time looking into how we can improve our planning of outer > joins. The current planner code slavishly follows the syntactic join > order, which can lead to quite bad plans. The reason it does this is that > in some cases altering the join order of outer joins can change the > results. However, there are many cases where the results would not be > changed, and we really need to start taking advantage of those cases. I wonder if the code is already able to transform right joins to left joins, like(A rightjoin B on (Pab)) = (B leftjoin A on (Pab)) I haven't looked at the code but I vaguely remember it is possible with some strings attached, like not being able to use not-mergejoinable conditions or something. I imagine it shows up as a leftjoin node with some flag set. How does this affect this optimization? Does this hold: (A rightjoin B on (Pab)) innerjoin C on (Pbc)= (B leftjoin A on (Pab)) innerjoin C on (Pbc)= (B innerjoin C on (Pbc)) leftjoinA on (Pab) ? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > I wonder if the code is already able to transform right joins to left > joins, like > (A rightjoin B on (Pab)) = (B leftjoin A on (Pab)) Yeah, we already know that part. It's a freebie --- I didn't even bother mentioning rightjoin in my post, since it's equivalent to leftjoin after swapping the inputs. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > I wonder if the code is already able to transform right joins to left > > joins, like > > (A rightjoin B on (Pab)) = (B leftjoin A on (Pab)) > > Yeah, we already know that part. It's a freebie --- I didn't even > bother mentioning rightjoin in my post, since it's equivalent to > leftjoin after swapping the inputs. Why the thing about the mergejoinable conditions then? Is that even true? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Why the thing about the mergejoinable conditions then? Is that even > true? There are some things that work off mergejoinable conditions, but the equivalence of right and left join isn't one of them ... regards, tom lane
> I'm not sure whether we'd need any additional planner knobs to control > this. I think that the existing join_collapse_limit GUC variable should > continue to exist, but its effect on left/right joins will be the same as > for inner joins. If anyone wants to force join order for outer joins more > than for inner joins, we'd need some other control setting, but I don't > currently see why that would be very useful. > > Does this seem like a reasonable agenda, or am I thinking too small? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend