Re: benchmarking the query planner - Mailing list pgsql-hackers

From Tom Lane
Subject Re: benchmarking the query planner
Date
Msg-id 15327.1229047970@sss.pgh.pa.us
Whole thread Raw
In response to Re: benchmarking the query planner  ("Robert Haas" <robertmhaas@gmail.com>)
Responses Re: benchmarking the query planner
List pgsql-hackers
"Robert Haas" <robertmhaas@gmail.com> writes:
>> It might be best to stop when the frequency drops below some threshold,
>> rather than taking a fixed number of entries.

> OK, I'll bite.  How do we decide where to put the cutoff?  If we make
> it too high, it will have a negative effect on join selectivity
> estimates; if it's too low, it won't really address the problem we're
> trying to fix.  I randomly propose p = 0.001, which should limit
> eqjoinsel() to about a million equality tests in the worst case.  In
> the synthetic example we were just benchmarking, that causes the
> entire MCV array to be tossed out the window (which feels about
> right).

Yeah.  One idle thought I had was that maybe the cutoff needs to
consider both probabilities: if the high-frequency MCVs on one side
chance to match to not-so-high-frequency MCVs on the other, you
would like to know about that.  As long as we keep the lists in
frequency order, it seems easy to implement this: for each MCV
examined by the outer loop, you run the inner loop until the product of
the outer and current inner frequency drops below whatever your
threshold is.  This doesn't immediately suggest what the threshold
ought to be, but it seems like it ought to be possible to determine
that given a desired maximum error in the overall estimate.  I'm also
not very clear on what the "total frequency" computations (matchfreq2
and unmatchfreq2 in the current code) ought to look like if we are using
a variable subset of the inner list.
        regards, tom lane


pgsql-hackers by date:

Previous
From: KaiGai Kohei
Date:
Subject: Re: Updates of SE-PostgreSQL 8.4devel patches (r1268)
Next
From: "Greg Stark"
Date:
Subject: Re: benchmarking the query planner