Home > mailing lists

Re: Odd statistics behaviour in 7.2 - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Odd statistics behaviour in 7.2
Date	February 13, 2002 19:43:30
Msg-id	782.1013636067@sss.pgh.pa.us Whole thread Raw
In response to	Re: Odd statistics behaviour in 7.2 ("Gordon A. Runkle" <gar@integrated-dynamics.com>)
List	pgsql-hackers

Tree view

"Gordon A. Runkle" <gar@integrated-dynamics.com> writes:
> Would it be fair to say that the correct workaround for now would
> be to use ALTER TABLE SET STATISTICS on columns of interest which have
> this near-unique characteristic?

Yeah, that's probably the best we can do until we can think of a better
estimation equation.

> Does ALTER TABLE SET STATISTICS only increase the histogram size, or
> does it also cause more rows to be sampled?

Both.  The Chaudhuri paper I referred to has some math purporting to
prove that the required sample size is directly proportional to the
histogram size, for fixed relative error in the histogram boundaries.
So I made the same parameter control both.

Actually the sample size is driven by the largest SET STATISTICS value
for any column of the table.  So you can pick which one you think a
larger histogram would be most useful for; it doesn't have to be the
same column that's got the bad-number-of-distinct-values problem.
Which columns, if any, do you do range queries on?  Those would be the
ones where a bigger histogram would be useful.
        regards, tom lane

pgsql-hackers by date:

From: "Gordon A. Runkle"
Date: 13 February 2002, 19:21:47
Subject: Re: Odd statistics behaviour in 7.2

From: "Rod Taylor"
Date: 13 February 2002, 20:12:31
Subject: Re: NAMEDATALEN Changes

Re: Odd statistics behaviour in 7.2 - Mailing list pgsql-hackers

Previous

Next