On Wed, 2005-02-02 at 11:02 +0100, Roman Neuhauser wrote:
> the SET STATISTICS description under ALTER TABLE doesn't explain how
> to choose the right target (is there a formula?)
There's no simple formula; in particular, it's not merely a matter of
raising the target for the columns of a large table. The more irregular
the distribution of the data in a particular column, the higher the
target should be (as the docs state, and as I've quoted below). I would
guess most people figure out how to set per-column stats by trial and
error, by looking at how accurate the planner's estimates are for
queries involving the column in question.
> it says "For more information on the use of statistics by the PostgreSQL
> query planner, refer to Section 13.2.", but that section *doesn't* talk
> about "the use of statistics by the query planner"; it doesn't hint
> a method to choose a good target either
I disagree:
The amount of information stored in pg_statistic, in particular
the maximum number of entries in the most_common_vals and
histogram_bounds arrays for each column, can be set on a
column-by-column basis using the ALTER TABLE SET STATISTICS
command, or globally by setting the default_statistics_target
configuration variable. The default limit is presently 10
entries. Raising the limit may allow more accurate planner
estimates to be made, particularly for columns with irregular
data distributions, at the price of consuming more space in
pg_statistic and slightly more time to compute the estimates.
Conversely, a lower limit may be appropriate for columns with
simple data distributions.
Suggestions for specific improvements are welcome (a patch against the
SGML is easiest).
-Neil