Re: benchmarking the query planner - Mailing list pgsql-hackers

From Tom Lane
Subject Re: benchmarking the query planner
Date
Msg-id 5616.1229093642@sss.pgh.pa.us
Whole thread Raw
In response to Re: benchmarking the query planner  ("Robert Haas" <robertmhaas@gmail.com>)
List pgsql-hackers
"Robert Haas" <robertmhaas@gmail.com> writes:
> On Thu, Dec 11, 2008 at 10:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Maybe so.  If we stick to the other design (end both lists at a preset
>> frequency threshold) then the math clearly goes through the same as
>> before, just with num_mcvs that are determined differently.  But can
>> we prove anything about the maximum error added from that?

> I don't think so, because in that design, it's entirely possible that
> you'll throw away the entire MCV list if all of the entries are below
> the threshold (as in the example we were just benchmarking, supposing
> a threshold of 0.001).

Right, but the question is how much that really hurts.  It's not like
we are going to pick a completely clueless number for the ignored MCVs;
rather, we are going to assume that they have the same stats as the
remainder of the population.  If the threshold frequency isn't very
large then the error involved should be bounded.  As an example, in the
perfectly flat distribution set up by the speed tests we were just
doing, there actually wouldn't be any error at all (assuming we got
ndistinct right, which of course is a pretty big assumption).  I haven't
consumed enough caffeine yet to try to do the math, but I think that if
you set the threshold as something a bit more than the assumed frequency
of a non-MCV value then it could work.

> An alternative is to pick a threshold T for the maximum number of
> equality probes that you're willing to suffer through.

I'd like to get there from the other direction, ie figure out what
T has to be to get known maximum error.
        regards, tom lane


pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: WIP: default values for function parameters
Next
From: Gregory Stark
Date:
Subject: Re: WIP: default values for function parameters