Thread: planner stats

planner stats

From
Roman Neuhauser
Date:
Hello,

trying to figure out the right statistics target for a large table,
I found the reference manual lacking:

the SET STATISTICS description under ALTER TABLE doesn't explain how
to choose the right target (is there a formula?), I'm not insisting it
should, but:

it says "For more information on the use of statistics by the PostgreSQL
query planner, refer to Section 13.2.", but that section *doesn't* talk
about "the use of statistics by the query planner"; it doesn't hint
a method to choose a good target either (the page can be summed up to:
"it's there, it influences the planner, you can change it through this
mechanism, access it that way").

runtime-config.html#GUC-DEFAULT-STATISTICS-TARGET also misdirects the
reader to 13.2.

--
FreeBSD 4.10-STABLE
10:48AM up 3:53, 4 users, load averages: 0.00, 0.00, 0.00

Re: planner stats

From
Neil Conway
Date:
On Wed, 2005-02-02 at 11:02 +0100, Roman Neuhauser wrote:
> the SET STATISTICS description under ALTER TABLE doesn't explain how
> to choose the right target (is there a formula?)

There's no simple formula; in particular, it's not merely a matter of
raising the target for the columns of a large table. The more irregular
the distribution of the data in a particular column, the higher the
target should be (as the docs state, and as I've quoted below). I would
guess most people figure out how to set per-column stats by trial and
error, by looking at how accurate the planner's estimates are for
queries involving the column in question.

> it says "For more information on the use of statistics by the PostgreSQL
> query planner, refer to Section 13.2.", but that section *doesn't* talk
> about "the use of statistics by the query planner"; it doesn't hint
> a method to choose a good target either

I disagree:

        The amount of information stored in pg_statistic, in particular
        the maximum number of entries in the most_common_vals and
        histogram_bounds arrays for each column, can be set on a
        column-by-column basis using the ALTER TABLE SET STATISTICS
        command, or globally by setting the default_statistics_target
        configuration variable. The default limit is presently 10
        entries. Raising the limit may allow more accurate planner
        estimates to be made, particularly for columns with irregular
        data distributions, at the price of consuming more space in
        pg_statistic and slightly more time to compute the estimates.
        Conversely, a lower limit may be appropriate for columns with
        simple data distributions.

Suggestions for specific improvements are welcome (a patch against the
SGML is easiest).

-Neil