Home > mailing lists

Re: Odd statistics behaviour in 7.2 - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Odd statistics behaviour in 7.2
Date	February 16, 2002 13:03:08
Msg-id	20368.1013882239@sss.pgh.pa.us Whole thread Raw
In response to	Re: Odd statistics behaviour in 7.2 ("Gordon A. Runkle" <gar@integrated-dynamics.com>)
List	pgsql-hackers

Tree view

BTW, while we're thinking about this, there's another aspect of the
number-of-distinct-values estimator that could use some peer review.
That's the decision whether to assume that the number of distinct
values in a column is fixed, or will vary with the size of the
table.  (For example, in a boolean column, ndistinct should clearly
be 2 no matter how large the table gets; but in any unique column
ndistinct should equal the table size.)  This is important since there
are times when we update the table size estimate (pg_class.reltuples)
without recomputing the statistics in pg_statistic.  The "negative
stadistinct" convention in pg_statistic is used to signal which case
ANALYZE thinks applies.

Presently the decision is pretty simplistic: if the estimated number
of distinct values is more than 10% of the number of rows, then assume
the number of distinct values scales with the number of rows.

I believe that some rule of this form is reasonable, but the 10%
threshold was just picked out of the air.  Can anyone suggest an
argument in favor of some other value, or a better way to look at it?
        regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Date: 16 February 2002, 12:22:34
Subject: Re: Odd statistics behaviour in 7.2

From: Tom Lane
Date: 16 February 2002, 14:22:42
Subject: Re: 7.2 and current timestamp bug?

Re: Odd statistics behaviour in 7.2 - Mailing list pgsql-hackers

Previous

Next