Home > mailing lists

Re: RFC: planner statistics in 7.2 - Mailing list pgsql-hackers

From	Peter Eisentraut
Subject	Re: RFC: planner statistics in 7.2
Date	April 20, 2001 11:21:35
Msg-id	Pine.LNX.4.30.0104201649080.758-100000@peter.localdomain Whole thread Raw
In response to	RFC: planner statistics in 7.2 (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: RFC: planner statistics in 7.2
List	pgsql-hackers

Tree view

Tom Lane writes:

> 2. The M boundary values of an equi-depth histogram, ie, the values that
> divide the data distribution into equal-population sets.  For example, if
> M is 3 this would consist of the min, median, and max values.

Why not model that data in a normal distribution (or some other such
model)?  This should give you fairly good estimates for <, > and between
type queries.  That way the user wouldn't have to tweak M.  (I couldn't
even imagine a way to explain to the user how to pick good M's.)

> Issue: for datatypes that have no '<' operator, we of course cannot
> compute a histogram --- but if there is an '=' then the most-common-value
> and number-of-distinct-values stats still make sense.

I think the statistics calculation should be data type (or operator, or
opclass) specific, like the selectivity estimation functions.  For
geometric data types you need completely different kinds of statistics,
and this would allow users to plug in different methods.  The
pg_statistics table could just have an array of floats where each data
type can store statistics as it chooses and the selectivity estimation
routines can interpret the values in a different way per data type.  That
way you can also make some common sense optimization without hacks, e.g.,
for boolean columns you possibly only need to calculate 1 value (number of
trues).

Of course, how to calculate up to N different sets of statistics for N
different columns that require up to N different numbers of passes over
the table is left as a challenge.  ;-)

This brings up a question I have:  Are statistics calculated for every
column?  Should they be?  Is there a possibility to speed up ANALYZE by
controlling this?

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter

pgsql-hackers by date:

From: Tom Lane
Date: 20 April 2001, 10:17:45
Subject: Re: AW: RFC: planner statistics in 7.2

From: Peter Eisentraut
Date: 20 April 2001, 11:21:38
Subject: Re: System catalog representation of access privileges

Re: RFC: planner statistics in 7.2 - Mailing list pgsql-hackers

Previous

Next