Hi!
On Wed, 23 Aug 2000, Tom Lane wrote:
> Yes, we know about that one. We have stats about the most common value
> in a column, but no information about how the less-common values are
> distributed. We definitely need stats about several top values not just
> one, because this phenomenon of a badly skewed distribution is pretty
> common.
An end-biased histogram has stats on top values and also on the least
frequent values. So if a there is a selection on a value that is well
bellow average, the selectivity estimation will be more acurate. On some
research papers I've read, it's refered that this is a better approach
than equi-width histograms (which are said to be the "industry" standard).
I not sure whether to use a table or a array attribute on pg_stat for
the histogram, the problem is what could be expected from the size of the
attribute (being a text). I'm very affraid of the cost of going through
several tuples on a table (pg_histogram?) during the optimization phase.
One other idea would be to only have better statistics for special
attributes requested by the user... something like "analyze special
table(column)".
Best Regards,
Tiago