Re: Odd statistics behaviour in 7.2 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Odd statistics behaviour in 7.2
Date
Msg-id 782.1013636067@sss.pgh.pa.us
Whole thread Raw
In response to Re: Odd statistics behaviour in 7.2  ("Gordon A. Runkle" <gar@integrated-dynamics.com>)
List pgsql-hackers
"Gordon A. Runkle" <gar@integrated-dynamics.com> writes:
> Would it be fair to say that the correct workaround for now would
> be to use ALTER TABLE SET STATISTICS on columns of interest which have
> this near-unique characteristic?

Yeah, that's probably the best we can do until we can think of a better
estimation equation.

> Does ALTER TABLE SET STATISTICS only increase the histogram size, or
> does it also cause more rows to be sampled?

Both.  The Chaudhuri paper I referred to has some math purporting to
prove that the required sample size is directly proportional to the
histogram size, for fixed relative error in the histogram boundaries.
So I made the same parameter control both.

Actually the sample size is driven by the largest SET STATISTICS value
for any column of the table.  So you can pick which one you think a
larger histogram would be most useful for; it doesn't have to be the
same column that's got the bad-number-of-distinct-values problem.
Which columns, if any, do you do range queries on?  Those would be the
ones where a bigger histogram would be useful.
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Gordon A. Runkle"
Date:
Subject: Re: Odd statistics behaviour in 7.2
Next
From: "Rod Taylor"
Date:
Subject: Re: NAMEDATALEN Changes