Gregory Stark <stark@enterprisedb.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> BTW, does anyone have an opinion about changing the upper limit for
>> default_stats_target to, say, 10000? These tests suggest that you
>> wouldn't want such a value for a column used as a join key, but
>> I can see a possible argument for high values in text search and
>> similar applications.
> I don't like the existing arbitrary limit which it sounds like people are
> really bumping into. But that curve looks like it might be getting awfully
> steep. I wonder just how long 10,000 would take?
Presumably, right about 100X longer than 1000 ... if we don't do
anything about limiting the number of values eqjoinsel looks at.
I think though that the case for doing so is pretty good. "MCVs" that
are beyond the K'th entry can't possibly have frequencies greater than
1/K, and in most cases it'll be a lot less. So the incremental
contribution to the accuracy of the join selectivity estimate drops off
pretty quickly, I should think. And it's not like we're ignoring the
existence of those values entirely --- we'd just be treating them as if
they are part of the undifferentiated collection of non-MCV values.
It might be best to stop when the frequency drops below some threshold,
rather than taking a fixed number of entries.
regards, tom lane