Re: benchmarking the query planner - Mailing list pgsql-hackers

From Tom Lane
Subject Re: benchmarking the query planner
Date
Msg-id 12862.1229036217@sss.pgh.pa.us
Whole thread Raw
In response to Re: benchmarking the query planner  (Gregory Stark <stark@enterprisedb.com>)
Responses Re: benchmarking the query planner  ("Robert Haas" <robertmhaas@gmail.com>)
List pgsql-hackers
Gregory Stark <stark@enterprisedb.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> BTW, does anyone have an opinion about changing the upper limit for
>> default_stats_target to, say, 10000?  These tests suggest that you
>> wouldn't want such a value for a column used as a join key, but
>> I can see a possible argument for high values in text search and
>> similar applications.

> I don't like the existing arbitrary limit which it sounds like people are
> really bumping into. But that curve looks like it might be getting awfully
> steep. I wonder just how long 10,000 would take?

Presumably, right about 100X longer than 1000 ... if we don't do
anything about limiting the number of values eqjoinsel looks at.

I think though that the case for doing so is pretty good.  "MCVs" that
are beyond the K'th entry can't possibly have frequencies greater than
1/K, and in most cases it'll be a lot less.  So the incremental
contribution to the accuracy of the join selectivity estimate drops off
pretty quickly, I should think.  And it's not like we're ignoring the
existence of those values entirely --- we'd just be treating them as if
they are part of the undifferentiated collection of non-MCV values.

It might be best to stop when the frequency drops below some threshold,
rather than taking a fixed number of entries.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: benchmarking the query planner
Next
From: "Robert Haas"
Date:
Subject: Re: benchmarking the query planner