Re: The science of optimization in practical terms? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: The science of optimization in practical terms?
Date
Msg-id 603c8f070902180908j3ae46774g535d96ece2c90e74@mail.gmail.com
Whole thread Raw
In response to Re: The science of optimization in practical terms?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, Feb 18, 2009 at 11:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Yeah, I thought about this too, but it seems like overkill for the
>> problem at hand, and as you say it's not clear you'd get any benefit
>> out of the upper bound anyway.  I was thinking of something simpler:
>> instead of directly multiplying 0.005 into the selectivity every time
>> you find something incomprehensible, keep a count of the number of
>> incomprehensible things you saw and at the end multiply by 0.005/N.
>> That way more unknown quals look more restrictive than fewer, but
>> things only get linearly wacky instead of exponentially wacky.
>
> clauselist_selectivity could perhaps apply such a heuristic, although
> I'm not sure how it could recognize "default" estimates from the various
> specific estimators, since they're mostly all different.

Presumably the estimators would need to be modified to provide some
information on their level of confidence in their estimate (possibly
this could be more general than whether the number is a default or
not, though I'm not sure what we'd do with that information).  But it
may not be necessary to go through that pain if we implement your idea
below.

> Personally I've not seen all that many practical cases where the
> estimator simply hasn't got a clue at all.  What's far more commonly
> complained of IME is failure to handle *correlated* conditions in
> an accurate fashion.  Maybe we should just discount the product
> selectivity all the time, not only when we think the components are
> default estimates.

That has something going for it, although off the top of my head I'm
not sure exactly what formula would make sense.  Presumably we want
the overall selectivity estimate to be less than the estimate for
individual clause taken individually, but how much less?  It doesn't
seem right to estimate the selectivity of S_1...S_n as MIN(S_1 ...
S_n) / n, because that will give you weird results with things like a
= 1 AND a != 2.  You might need to divide the estimates into two
buckets: those that reduce selectivity by a lot, and those that reduce
it only slightly, then multiply the latter bucket and, say, divide
through by the cardinality of the former bucket.  But the exact
details of the math are not obvious to me.

I'm talking off the top of my head here, maybe you have a more clear
thought as to how this would work?

...Robert


pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: Multi calendar system for pgsql
Next
From: Bruce Momjian
Date:
Subject: Re: pg_migrator progress