Re: nested loop semijoin estimates - Mailing list pgsql-hackers
From | Josh Berkus |
---|---|
Subject | Re: nested loop semijoin estimates |
Date | |
Msg-id | 556CD303.5080507@agliodbs.com Whole thread Raw |
In response to | nested loop semijoin estimates (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: nested loop semijoin estimates
|
List | pgsql-hackers |
On 06/01/2015 02:18 PM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> On 06/01/15 00:08, Tom Lane wrote: >>> Attached is an incremental patch (on top of the previous one) to >>> allow startup cost of parameterized paths to be considered when the >>> relation is the RHS of a semi or anti join. It seems reasonably clean >>> except for one thing: logically, we perhaps should remove the checks >>> on path->param_info from the last half of >>> compare_path_costs_fuzzily(), so as to increase the symmetry between >>> parameterized paths and unparameterized ones. However, when I did >>> that, I got changes in some of the regression test plans, and they >>> didn't seem to be for the better. So I left that alone. As-is, this >>> patch doesn't seem to affect the results for any existing regression >>> tests. > >> Regarding the remaining checks in compare_path_costs_fuzzily(), isn't >> that simply caused by very small data sets? > > No, I don't think the size of the data set is the issue. The point is > that previously, if we had two parameterized paths of fuzzily the same > total cost, we'd choose which to keep based on factors other than startup > cost. If we change that part of compare_path_costs_fuzzily(), we'll be > changing the behavior in cases that are unrelated to the problem we are > trying to solve, because same-total-cost isn't going to apply to the > bitmap indexscan vs. plain indexscan cases we're worried about here. > (The places where it tends to happen are things like hash left join > vs hash right join with the inputs swapped.) I don't want to change such > cases in a patch that's meant to be back-patched; people don't like plan > instability in stable branches. > > It's also not quite clear what the new rule ought to be. Rather than > treating parameterized paths exactly like regular ones, maybe we should > consider their startup cost only if consider_param_startup is true. > It remains the case that in most scenarios for parameterized paths, > startup cost really isn't that interesting and shouldn't drive choices. > > Another practical problem is that the regression tests that are being > affected are specifically meant to test particular scenarios in > construction of nested-loop join plans. If the planner stops selecting > a nestloop plan there, the test is effectively broken. We could probably > rejigger the test queries to restore the intended test scenario, but that > wasn't an exercise I was in a hurry to undertake. > > Anyway, bottom line is that if we want to change that, it ought to be > a separate HEAD-only patch. > >>>> Do you plan to push that into 9.5, or 9.6? I assume it's a >>>> behavior change so that no back-patching, right? > >>> Mumble. It's definitely a planner bug fix, and I believe that the >>> effects are narrowly constrained to cases where significantly better >>> plans are possible. So personally I'd be willing to back-patch it. >>> But others might see that differently, especially since it's been >>> like this for a long time and we only have one field complaint. > >> +1 to back-patching from me. It's true we only have one field complaint, >> but I believe there are more users impacted by this. They just didn't >> notice - we only really got this complaint because of the plan >> discrepancy between queries that are almost exactly the same. > > Perhaps. It seems like in many cases the planner would accidentally > choose the "right" plan anyway, because even though it was overestimating > the cost, the other alternatives would usually look even worse. So it's > hard to tell how many users will be benefited. But I'm fairly sure the > patch as posted is narrow enough that very few people would be negatively > affected. > > Anybody else want to speak for or against back-patching the patch as > posted? I intentionally didn't push it in before today's releases, > but I will push it later this week if there are not objections. I would like Mark Wong to test this on DBT3, if that's possible for him.I'm very worried about an unanticipated regression. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
pgsql-hackers by date: