Re: [HACKERS] OUTER JOIN performance regression remains in 8.3beta4 - Mailing list pgsql-patches
From | Tom Lane |
---|---|
Subject | Re: [HACKERS] OUTER JOIN performance regression remains in 8.3beta4 |
Date | |
Msg-id | 14115.1199826676@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [HACKERS] OUTER JOIN performance regression remains in 8.3beta4 (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [HACKERS] OUTER JOIN performance regression remains in 8.3beta4
|
List | pgsql-patches |
I wrote: > Haven't looked closely at how to fix 8.2, yet. After some study it seems that the simplest, most reliable fix for 8.2 is to dike out the code that removes "redundant" outer join conditions after propagating a constant across them. This gives the right answer in the cases of concern (where we actually need the join condition) and doesn't really add much overhead in the cases where we don't need it. One small problem is that the join condition is redundant with the generated constant-equality constraints (mostly so, even if not entirely so) which will cause the planner to underestimate the size of the join, since clausesel.c is not very bright at all about redundant conditions. However, we already have a hack we can use for that: we can force the cached selectivity estimate for the join clause to 1.0, so that it's not considered to reduce the join size any more than the constant conditions already did. (This is also a problem in my earlier patch for 8.3, with the same fix possible.) That leads to the attached very simple patch. There is some dead code left behind, but it doesn't seem worth removing it. I'm rather tempted to patch 8.1 similarly, even though it doesn't fail on the known test case --- I'm far from convinced that there are no related cases that will make it fail, and in any case it's getting the selectivity wrong. 8.0 and before don't try to propagate constants like this, so they're not at risk. Comparing the behavior of this to my patch for HEAD, I am coming to the conclusion that this is actually a *better* planning method than removing the redundant join conditions, even when they're truly rendundant! The reason emerges as soon as you look at cases involving more than a single join. If we strip the join condition from just one of the joins, then we find that the planner insists on doing that join last, whether it's a good idea or not, because clauseful joins are always preferred to clauseless joins in the join search logic. What's worse, knowing that this is an outer join, is that the only available plan type for a clauseless outer join is a NestLoop with the inner side on the right, which again may be a highly nonoptimal way to do it. None of this matters a whole lot if the pushed-down constant conditions select single rows, but it does if they select multiple rows. I'm trying this in the regression database: select * from tenk1 a left join tenk1 b on (a.hundred = b.hundred) left join tenk1 c on (b.hundred = c.hundred) where a.hundred = 42; and finding patched 8.2 about 2X faster than 8.3 because it selects a better plan that avoids multiple rescans of subplans. So I'm coming around to the idea that getting rid of the "redundant" join conditions is foolish micro-optimization, and we should leave them in place even when we know they're redundant. The extra execution cycles paid to test the condition don't amount to much in any case, and the risk of getting a bad plan is too high. This is a reasonably simple adjustment to my prior patch for 8.3, which I will go ahead and make if there are no objections... regards, tom lane
Attachment
pgsql-patches by date: