Re: nested loop semijoin estimates - Mailing list pgsql-hackers

From Tom Lane
Subject Re: nested loop semijoin estimates
Date
Msg-id 4398.1433193515@sss.pgh.pa.us
Whole thread Raw
In response to Re: nested loop semijoin estimates  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 06/01/15 00:08, Tom Lane wrote:
>> Attached is an incremental patch (on top of the previous one) to
>> allow startup cost of parameterized paths to be considered when the
>> relation is the RHS of a semi or anti join. It seems reasonably clean
>> except for one thing: logically, we perhaps should remove the checks
>> on path->param_info from the last half of
>> compare_path_costs_fuzzily(), so as to increase the symmetry between
>> parameterized paths and unparameterized ones. However, when I did
>> that, I got changes in some of the regression test plans, and they
>> didn't seem to be for the better. So I left that alone. As-is, this
>> patch doesn't seem to affect the results for any existing regression
>> tests.

> Regarding the remaining checks in compare_path_costs_fuzzily(), isn't 
> that simply caused by very small data sets?

No, I don't think the size of the data set is the issue.  The point is
that previously, if we had two parameterized paths of fuzzily the same
total cost, we'd choose which to keep based on factors other than startup
cost.  If we change that part of compare_path_costs_fuzzily(), we'll be
changing the behavior in cases that are unrelated to the problem we are
trying to solve, because same-total-cost isn't going to apply to the
bitmap indexscan vs. plain indexscan cases we're worried about here.
(The places where it tends to happen are things like hash left join
vs hash right join with the inputs swapped.)  I don't want to change such
cases in a patch that's meant to be back-patched; people don't like plan
instability in stable branches.

It's also not quite clear what the new rule ought to be.  Rather than
treating parameterized paths exactly like regular ones, maybe we should
consider their startup cost only if consider_param_startup is true.
It remains the case that in most scenarios for parameterized paths,
startup cost really isn't that interesting and shouldn't drive choices.

Another practical problem is that the regression tests that are being
affected are specifically meant to test particular scenarios in
construction of nested-loop join plans.  If the planner stops selecting
a nestloop plan there, the test is effectively broken.  We could probably
rejigger the test queries to restore the intended test scenario, but that
wasn't an exercise I was in a hurry to undertake.

Anyway, bottom line is that if we want to change that, it ought to be
a separate HEAD-only patch.

>>> Do you plan to push that into 9.5, or 9.6? I assume it's a
>>> behavior change so that no back-patching, right?

>> Mumble. It's definitely a planner bug fix, and I believe that the
>> effects are narrowly constrained to cases where significantly better
>> plans are possible. So personally I'd be willing to back-patch it.
>> But others might see that differently, especially since it's been
>> like this for a long time and we only have one field complaint.

> +1 to back-patching from me. It's true we only have one field complaint, 
> but I believe there are more users impacted by this. They just didn't 
> notice - we only really got this complaint because of the plan 
> discrepancy between queries that are almost exactly the same.

Perhaps.  It seems like in many cases the planner would accidentally
choose the "right" plan anyway, because even though it was overestimating
the cost, the other alternatives would usually look even worse.  So it's
hard to tell how many users will be benefited.  But I'm fairly sure the
patch as posted is narrow enough that very few people would be negatively
affected.

Anybody else want to speak for or against back-patching the patch as
posted?  I intentionally didn't push it in before today's releases,
but I will push it later this week if there are not objections.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Next
From: Alvaro Herrera
Date:
Subject: Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1