On Mon, Nov 17, 2014 at 4:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Nov 13, 2014 at 7:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> One thing that occurs to me is that if the generic plan estimate comes
>>> out much cheaper than the custom one, maybe we should assume that the
>>> generic's cost estimate is bogus. Right offhand I can't think of a reason
>>> for a custom plan to look worse than a generic one, unless there's a
>>> statistical quirk like this one.
>
>> That's an interesting idea, but what do we do after deciding that it's
>> bogus?
>
> Keep using custom plans. It's possible that the estimate that's in error
> is the custom one, but that's not the way to bet IMO, since the custom
> plan estimate is based on better information.
>
>> The generic plan really can't be cheaper than the custom plan,
>> but it could be the same price, or as close as makes no difference.
>
> Right, and what we want to do is use the generic plan as long as it's
> close to the same cost (close enough to not justify replanning effort).
> The trick here is to not be fooled by estimation errors. Can we assume
> that generic cost < custom cost is always an estimation error?
Maybe. It seems like kind of a fragile bet to me. There's going to
be some qual selectivity below which an index scan on a particular
table outperforms a sequential scan, but the selectivity estimated for
a generic plan can be either higher or lower than the selectivity we'd
estimate for some particular value. And once one of the two plans
decides on an index scan while the other one divides on a sequential
scan, it can cascade through and change the whole plan - e.g. because
it affects whether the tuples emerge with usable pathkeys. I don't
feel very confident about assuming that applying < to the result of
all that is going to tell us anything useful.
I think what's really going on here is that the generic plan will be
optimal for some range of possible qual selectivities. Typically, the
qual is of the form col = val, so the plan will be optimal for all
values where the estimated frequency is between some values A and B.
What would be nice is to notice when we see a value that is outside
that range, and switch to a custom plan in that case. I realize that
the planner isn't really well set up to deliver the information we'd
need to test that for each new supplied value, but that's really what
we need to do. The current system wastes CPU cycles by replanning up
to 5 times even when there is no benefit to be gained by it, but can
also cause big performance problems when it settles into a generic
plan and then a value with different characteristics shows up later
on.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company