Re: nested loop semijoin estimates - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: nested loop semijoin estimates |
Date | |
Msg-id | 4398.1433193515@sss.pgh.pa.us Whole thread Raw |
In response to | Re: nested loop semijoin estimates (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
List | pgsql-hackers |
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On 06/01/15 00:08, Tom Lane wrote: >> Attached is an incremental patch (on top of the previous one) to >> allow startup cost of parameterized paths to be considered when the >> relation is the RHS of a semi or anti join. It seems reasonably clean >> except for one thing: logically, we perhaps should remove the checks >> on path->param_info from the last half of >> compare_path_costs_fuzzily(), so as to increase the symmetry between >> parameterized paths and unparameterized ones. However, when I did >> that, I got changes in some of the regression test plans, and they >> didn't seem to be for the better. So I left that alone. As-is, this >> patch doesn't seem to affect the results for any existing regression >> tests. > Regarding the remaining checks in compare_path_costs_fuzzily(), isn't > that simply caused by very small data sets? No, I don't think the size of the data set is the issue. The point is that previously, if we had two parameterized paths of fuzzily the same total cost, we'd choose which to keep based on factors other than startup cost. If we change that part of compare_path_costs_fuzzily(), we'll be changing the behavior in cases that are unrelated to the problem we are trying to solve, because same-total-cost isn't going to apply to the bitmap indexscan vs. plain indexscan cases we're worried about here. (The places where it tends to happen are things like hash left join vs hash right join with the inputs swapped.) I don't want to change such cases in a patch that's meant to be back-patched; people don't like plan instability in stable branches. It's also not quite clear what the new rule ought to be. Rather than treating parameterized paths exactly like regular ones, maybe we should consider their startup cost only if consider_param_startup is true. It remains the case that in most scenarios for parameterized paths, startup cost really isn't that interesting and shouldn't drive choices. Another practical problem is that the regression tests that are being affected are specifically meant to test particular scenarios in construction of nested-loop join plans. If the planner stops selecting a nestloop plan there, the test is effectively broken. We could probably rejigger the test queries to restore the intended test scenario, but that wasn't an exercise I was in a hurry to undertake. Anyway, bottom line is that if we want to change that, it ought to be a separate HEAD-only patch. >>> Do you plan to push that into 9.5, or 9.6? I assume it's a >>> behavior change so that no back-patching, right? >> Mumble. It's definitely a planner bug fix, and I believe that the >> effects are narrowly constrained to cases where significantly better >> plans are possible. So personally I'd be willing to back-patch it. >> But others might see that differently, especially since it's been >> like this for a long time and we only have one field complaint. > +1 to back-patching from me. It's true we only have one field complaint, > but I believe there are more users impacted by this. They just didn't > notice - we only really got this complaint because of the plan > discrepancy between queries that are almost exactly the same. Perhaps. It seems like in many cases the planner would accidentally choose the "right" plan anyway, because even though it was overestimating the cost, the other alternatives would usually look even worse. So it's hard to tell how many users will be benefited. But I'm fairly sure the patch as posted is narrow enough that very few people would be negatively affected. Anybody else want to speak for or against back-patching the patch as posted? I intentionally didn't push it in before today's releases, but I will push it later this week if there are not objections. regards, tom lane
pgsql-hackers by date: