On Wed, Apr 3, 2013 at 9:40 PM, Greg Stark <stark@mit.edu> wrote:
> I used to advocate a similar idea. But when questioned on list I tried to
> work out the details and ran into some problem coming up with a concrete
> plan.
>
> How do you compare a plan that you think has a 99% chance of running in 1ms
> but a 1% chance of taking 1s against a plan that has a 90% chance of 1ms and
> a 10% chance of taking 100ms? Which one is actually riskier? They might even
> both have the same 95% percentile run-time.
>
> And additionally there are different types of unknowns. Do you want to treat
> plans where we have a statistical sample that gives us a probabilistic
> answer the same as plans where we think our model just has a 10% chance of
> being wrong? The model is going to either be consistently right or
> consistently wrong for a given query but the sample will vary from run to
> run. (Or vice versa depending on the situation).
One idea that someone through up against a wall recently during an
EnterpriseDB development meeting - I think it was Kevin Grittner but
it might have been Noah Misch, or some combination - was to have a GUC
for estimate_worstcase_fraction. So, when computing the cost of a
path, we'd compute our current expected-case estimate, and also a
worst-case estimate, and then compute the final cost as:
estimated_cost = estimate_worstcase_fraction * worst_case_estimate +
(1 - estimate_worstcase_fraction) * expected_case_estimate
I think Kevin and I both have the intuition that even a rather small
value for estimate_worstcase_fraction - like 0.01 or 0.001 or even
smaller - would prevent a pretty significant fraction of the problems
people encounter in this area. But users could change it, in general
or for particular queries, if it ended up being wrong in their
environment.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company