Re: nested loop semijoin estimates - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: nested loop semijoin estimates
Date
Msg-id 556CD303.5080507@agliodbs.com
Whole thread Raw
In response to nested loop semijoin estimates  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: nested loop semijoin estimates
List pgsql-hackers
On 06/01/2015 02:18 PM, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On 06/01/15 00:08, Tom Lane wrote:
>>> Attached is an incremental patch (on top of the previous one) to
>>> allow startup cost of parameterized paths to be considered when the
>>> relation is the RHS of a semi or anti join. It seems reasonably clean
>>> except for one thing: logically, we perhaps should remove the checks
>>> on path->param_info from the last half of
>>> compare_path_costs_fuzzily(), so as to increase the symmetry between
>>> parameterized paths and unparameterized ones. However, when I did
>>> that, I got changes in some of the regression test plans, and they
>>> didn't seem to be for the better. So I left that alone. As-is, this
>>> patch doesn't seem to affect the results for any existing regression
>>> tests.
> 
>> Regarding the remaining checks in compare_path_costs_fuzzily(), isn't 
>> that simply caused by very small data sets?
> 
> No, I don't think the size of the data set is the issue.  The point is
> that previously, if we had two parameterized paths of fuzzily the same
> total cost, we'd choose which to keep based on factors other than startup
> cost.  If we change that part of compare_path_costs_fuzzily(), we'll be
> changing the behavior in cases that are unrelated to the problem we are
> trying to solve, because same-total-cost isn't going to apply to the
> bitmap indexscan vs. plain indexscan cases we're worried about here.
> (The places where it tends to happen are things like hash left join
> vs hash right join with the inputs swapped.)  I don't want to change such
> cases in a patch that's meant to be back-patched; people don't like plan
> instability in stable branches.
> 
> It's also not quite clear what the new rule ought to be.  Rather than
> treating parameterized paths exactly like regular ones, maybe we should
> consider their startup cost only if consider_param_startup is true.
> It remains the case that in most scenarios for parameterized paths,
> startup cost really isn't that interesting and shouldn't drive choices.
> 
> Another practical problem is that the regression tests that are being
> affected are specifically meant to test particular scenarios in
> construction of nested-loop join plans.  If the planner stops selecting
> a nestloop plan there, the test is effectively broken.  We could probably
> rejigger the test queries to restore the intended test scenario, but that
> wasn't an exercise I was in a hurry to undertake.
> 
> Anyway, bottom line is that if we want to change that, it ought to be
> a separate HEAD-only patch.
> 
>>>> Do you plan to push that into 9.5, or 9.6? I assume it's a
>>>> behavior change so that no back-patching, right?
> 
>>> Mumble. It's definitely a planner bug fix, and I believe that the
>>> effects are narrowly constrained to cases where significantly better
>>> plans are possible. So personally I'd be willing to back-patch it.
>>> But others might see that differently, especially since it's been
>>> like this for a long time and we only have one field complaint.
> 
>> +1 to back-patching from me. It's true we only have one field complaint, 
>> but I believe there are more users impacted by this. They just didn't 
>> notice - we only really got this complaint because of the plan 
>> discrepancy between queries that are almost exactly the same.
> 
> Perhaps.  It seems like in many cases the planner would accidentally
> choose the "right" plan anyway, because even though it was overestimating
> the cost, the other alternatives would usually look even worse.  So it's
> hard to tell how many users will be benefited.  But I'm fairly sure the
> patch as posted is narrow enough that very few people would be negatively
> affected.
> 
> Anybody else want to speak for or against back-patching the patch as
> posted?  I intentionally didn't push it in before today's releases,
> but I will push it later this week if there are not objections.

I would like Mark Wong to test this on DBT3, if that's possible for him.I'm very worried about an unanticipated
regression.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Next
From: David Steele
Date:
Subject: Re: pg_xlog -> pg_xjournal?