markw <markw@mohawksoft.com> writes:
> Just a question, however, what is the feeling about the way statistics are
> currently being calculated?
They suck, no question about it ;-)
> My feeling is that some sort of windowing
> algorithm be used to normalize the statistics to the majority of the entries
> in a table. It could be as simple as discarding the upper and lower 10% of
> the record stats, and use the remaining 80% for statistics.
I think what most of the discussion has focused on is building
histograms. The current upper-and-lower-bounds-only approach just
plain isn't enough data, even if you discard outliers so that the
data isn't actively misleading.
As far as the most-common-value issue goes, if you have one value that
is vastly more common than any other, I think it would be a real mistake
to throw away that information --- that would mean the planner would do
the wrong thing for queries that do involve that value. What we need
is to save info about several top-frequency values, maybe three or so,
not just one. Also the finding of those values needs to be much more
robust than it is currently.
See past discussions in pghackers --- there have been plenty...
regards, tom lane