Thread: RFC: planner statistics in 7.2

RFC: planner statistics in 7.2

From
Tom Lane
Date:
Request for comments:

Overview
--------

The planner is currently badly handicapped by inadequate statistical
information.  The stats that VACUUM ANALYZE currently computes are:

per-table:number of disk pagesnumber of tuples

per-column:dispersionminimum and maximum values (if datatype has a suitable '<' operator)most common valuefraction of
entriesthat are the most common valuefraction of entries that are NULL
 

Not only is this an impoverished dataset, but the dispersion and
most-common-value are calculated in a very unreliable way, and cannot
be trusted.

Even though the stats are meager and untrustworthy, they are expensive
to get: the per-column data is updated only by VACUUM ANALYZE, which
is quite unpleasant to do regularly on large tables.

It is possible to get better statistics with less work.  Here is a
proposal for revising the stats mechanisms for 7.2.

I believe that it's still a good idea for the stats to be updated by an
explicit housekeeping command.  Although we could think of updating them
on-the-fly, that seems expensive and slow.  Another objection is that
planning results will become difficult to reproduce if the underlying
statistics change constantly.  But if we stick to the explicit-command
approach then we need to make the command much faster than VACUUM ANALYZE.
We can address that in two ways:

(1) The statistics-gathering process should be available as a standalone
command, ANALYZE [ tablename ], not only as part of VACUUM.  (This was
already discussed and agreed to for 7.1, but it never got done.)  Note
that a pure ANALYZE command needs only a read lock on the target table,
not an exclusive lock as VACUUM needs, so it's much more friendly to
concurrent transactions.

(2) Statistics should be computed on the basis of a random sample of the
target table, rather than a complete scan.  According to the literature
I've looked at, sampling a few thousand tuples is sufficient to give good
statistics even for extremely large tables; so it should be possible to
run ANALYZE in a short amount of time regardless of the table size.

If we do both of these then I think that ANALYZE will be painless enough
to do frequently, so there's no need to try to fold stats-updating into
regular operations.


More extensive statistics
-------------------------

The min, max, and most common value are not enough data, particularly
not for tables with heavily skewed data distributions.  Instead,
pg_statistic should store three small arrays for each column:

1. The K most common values and their frequencies, for some (adjustable)
parameter K.  This will be omitted if the column appears unique (no
duplicate values found).

2. The M boundary values of an equi-depth histogram, ie, the values that
divide the data distribution into equal-population sets.  For example, if
M is 3 this would consist of the min, median, and max values.

K and M might both be about 10 for a typical column.  In principle at
least, these numbers could be user-settable parameters, to allow trading
off estimation accuracy against amount of space used in pg_statistics.

A further refinement is that the histogram should describe the
distribution of the data *after excluding the given most-common values*
(that is, it's a "compressed histogram").  This allows more accurate
representation of the data distribution when there are just a few
very-common values.  For a column with not many distinct values (consider
a boolean column), the most-common-value list might describe the complete
dataset, in which case the histogram collapses to nothing.  (We'd still
have it store the min and max, just so that it's not necessary to scan the
most-common-value array to determine those values.)

I am also inclined to remove the attdispersion parameter in favor of
storing an estimate of the number of distinct values in the column.
We can compute more accurate selectivities using the most-common-value
frequencies and the distinct-values estimate than we could get from
dispersion.

Another useful idea is to try to estimate the correlation between physical
table order and logical order of any given column --- this would let us
account for the effect of clustering when estimating indexscan costs.
I don't yet know the appropriate formula to use, but this seems quite
doable.

Finally, it would be a good idea to add an estimate of average width
of varlena fields to pg_statistic.  This would allow the estimated
tuple width computed by the planner to have something to do with reality,
which in turn would improve the cost estimation for hash joins (which
need to estimate the size of the in-memory hash table).


Computing the statistics
------------------------

The best way to obtain these stats seems to be:

1. Scan the target table and obtain a random sample of R tuples.  R is
chosen in advance based on target histogram size (M) --- in practice R
will be about 3000 or so.  If the table contains fewer than that many
tuples then the "sample" will be the whole table.  The sample can be
obtained in one pass using Vitter's algorithm or similar.  Note that for
expected values of R we shouldn't have any trouble storing all the sampled
tuples in memory.

2. For each column in the table, extract the column value from each
sampled tuple, and sort these values into order.  A simple scan of the
values then suffices to find the most common values --- we only need to
count adjacent duplicates and remember the K highest counts.  After we
have those, simple arithmetic will let us find the positions that contain
the histogram boundary elements.  Again, all this can be done in-memory
and so should be pretty fast.

The sort step will require detoasting any toasted values.  To avoid risk
of memory overflow, we may exclude extremely wide toasted values from the
sort --- this shouldn't affect the stats much, since such values are
unlikely to represent duplicates.  Such an exclusion will also prevent
wide values from taking up lots of space in pg_statistic, if they happened
to be chosen as histogram entries.

Issue: for datatypes that have no '<' operator, we of course cannot
compute a histogram --- but if there is an '=' then the most-common-value
and number-of-distinct-values stats still make sense.  Is there a way to
derive these without O(R^2) brute-force comparisons?  We could still do
a scan of the R sample values using something like the existing
comparison algorithm (but using S > K work slots); this would cost about
S*R rather than R^2 comparisons.

A different approach that's been discussed on pghackers is to make use
of btree indexes for columns that have such indexes: we could scan the
indexes to visit all the column values in sorted order.  I have rejected
that approach because (a) it doesn't help for columns without a suitable
index; (b) our indexes don't distinguish deleted and live tuples,
which would skew the statistics --- in particular, we couldn't tell a
frequently-updated single tuple from a commonly repeated value; (c)
scanning multiple indexes would likely require more total I/O than just
grabbing sample tuples from the main table --- especially if we have to
do that anyway to handle columns without indexes.


User-visible changes
--------------------

Aside from implementing the stand-alone ANALYZE statement, it seems that
it would be a good idea to allow user control of the target numbers of
statistical entries (K and M above).  A simple approach would be a SET
variable or explicit parameter for ANALYZE.  But I am inclined to think
that it'd be better to create a persistent per-column state for this,
set by sayALTER TABLE tab SET COLUMN col STATS COUNT n
(better names welcome).  The target count for each column could be stored
in pg_attribute.  Doing it this way would let the dbadmin establish higher
or lower targets for especially irregular or simple columns, and then
forget about the numbers --- he wouldn't have to tweak his periodic cron
script to apply the right parameters to the right tables/columns.
        regards, tom lane


Re: RFC: planner statistics in 7.2y

From
Bruce Momjian
Date:
> (1) The statistics-gathering process should be available as a standalone
> command, ANALYZE [ tablename ], not only as part of VACUUM.  (This was
> already discussed and agreed to for 7.1, but it never got done.)  Note
> that a pure ANALYZE command needs only a read lock on the target table,
> not an exclusive lock as VACUUM needs, so it's much more friendly to
> concurrent transactions.

7.1 already does the ANALYZE part of VACUUM ANALYZE with lighter
locking.  I just never split out the command to be separate, partly
because of fear of user confusion, and I ran out of time.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: RFC: planner statistics in 7.2y

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> that a pure ANALYZE command needs only a read lock on the target table,
>> not an exclusive lock as VACUUM needs, so it's much more friendly to
>> concurrent transactions.

> 7.1 already does the ANALYZE part of VACUUM ANALYZE with lighter
> locking.

Right, but you still have to run through the VACUUM part, which will
hold down an exclusive lock for a considerable amount of time (if the
table is big).
        regards, tom lane


Re: RFC: planner statistics in 7.2

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> At 18:37 19/04/01 -0400, Tom Lane wrote:
>> (2) Statistics should be computed on the basis of a random sample of the
>> target table, rather than a complete scan.  According to the literature
>> I've looked at, sampling a few thousand tuples is sufficient to give good
>> statistics even for extremely large tables; so it should be possible to
>> run ANALYZE in a short amount of time regardless of the table size.

> This sounds great; can the same be done for clustering. ie. pick a random
> sample of index nodes, look at the record pointers and so determine how
> well clustered the table is?

My intention was to use the same tuples sampled for the data histograms
to estimate how well sorted the data is.  However it's not immediately
clear that that'll give a trustworthy estimate; I'm still studying it ...

>> ALTER TABLE tab SET COLUMN col STATS COUNT n

> Sounds fine - user-selectability at the column level seems a good idea.
> Would there be any value in not making it part of a normal SQLxx statement,
> and adding an 'ALTER STATISTICS' command? eg. 

>     ALTER STATISTICS FOR tab[.column] COLLECT n
>     ALTER STATISTICS FOR tab SAMPLE m

Is that more standard than the other syntax?
        regards, tom lane


Re: RFC: planner statistics in 7.2

From
Philip Warner
Date:
At 18:37 19/04/01 -0400, Tom Lane wrote:
>(2) Statistics should be computed on the basis of a random sample of the
>target table, rather than a complete scan.  According to the literature
>I've looked at, sampling a few thousand tuples is sufficient to give good
>statistics even for extremely large tables; so it should be possible to
>run ANALYZE in a short amount of time regardless of the table size.

This sounds great; can the same be done for clustering. ie. pick a random
sample of index nodes, look at the record pointers and so determine how
well clustered the table is?


>A simple approach would be a SET
>variable or explicit parameter for ANALYZE.  But I am inclined to think
>that it'd be better to create a persistent per-column state for this,
>set by say
>    ALTER TABLE tab SET COLUMN col STATS COUNT n

Sounds fine - user-selectability at the column level seems a good idea.
Would there be any value in not making it part of a normal SQLxx statement,
and adding an 'ALTER STATISTICS' command? eg. 
   ALTER STATISTICS FOR tab[.column] COLLECT n   ALTER STATISTICS FOR tab SAMPLE m

etc.









----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: RFC: planner statistics in 7.2

From
Philip Warner
Date:
At 20:48 19/04/01 -0400, Tom Lane wrote:
>
>> This sounds great; can the same be done for clustering. ie. pick a random
>> sample of index nodes, look at the record pointers and so determine how
>> well clustered the table is?
>
>My intention was to use the same tuples sampled for the data histograms
>to estimate how well sorted the data is.  However it's not immediately
>clear that that'll give a trustworthy estimate; I'm still studying it ...

I'm not sure you want to know how well sorted it is in general, but you do
want to know the expected cost in IOs of reading all records from a given
index node, so you can more accurately estimate indexscan costs. AFAICS it
does not require that the entire table be sorted. So checking the pointers
on the index nodes gives an idea of clustering.


>
>>     ALTER STATISTICS FOR tab[.column] COLLECT n
>>     ALTER STATISTICS FOR tab SAMPLE m
>
>Is that more standard than the other syntax?
>

Not at all. It just avoids messing with one of the standard statements.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: RFC: planner statistics in 7.2

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> I'm not sure you want to know how well sorted it is in general, but you do
> want to know the expected cost in IOs of reading all records from a given
> index node, so you can more accurately estimate indexscan costs. AFAICS it
> does not require that the entire table be sorted. So checking the pointers
> on the index nodes gives an idea of clustering.

But you don't really need to look at the index (if it even exists
at the time you do the ANALYZE).  The extent to which the data is
ordered in the table is a property of the table, not the index.
I'd prefer to get the stat just from the table and not have to do
additional I/O to examine the indexes.

But, as I said, I'm still reading the literature about estimation
methods for indexscans.  It may turn out that a statistic calculated
this way isn't the right thing to use, or isn't trustworthy when taken
over a small sample.

>> Is that more standard than the other syntax?

> Not at all. It just avoids messing with one of the standard statements.

Oh, so you're deliberately not being standard.  I see.  Probably a
reasonable point.
        regards, tom lane


Re: RFC: planner statistics in 7.2

From
Philip Warner
Date:
At 21:14 19/04/01 -0400, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>> I'm not sure you want to know how well sorted it is in general, but you do
>> want to know the expected cost in IOs of reading all records from a given
>> index node, so you can more accurately estimate indexscan costs. AFAICS it
>> does not require that the entire table be sorted. So checking the pointers
>> on the index nodes gives an idea of clustering.
>
>But you don't really need to look at the index (if it even exists
>at the time you do the ANALYZE).  The extent to which the data is
>ordered in the table is a property of the table, not the index.

But the value (and cost) of using a specific index in an indexscan depends
on that index (or am I missing something?). 


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: RFC: planner statistics in 7.2

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> At 21:14 19/04/01 -0400, Tom Lane wrote:
>> But you don't really need to look at the index (if it even exists
>> at the time you do the ANALYZE).  The extent to which the data is
>> ordered in the table is a property of the table, not the index.

> But the value (and cost) of using a specific index in an indexscan depends
> on that index (or am I missing something?). 

All that we're discussing here is one specific parameter in the cost
estimation for an indexscan, viz, the extent to which the table ordering
agrees with the index ordering.  As long as they both agree about the
ordering operator, this number doesn't depend on the index --- the index
is by definition in perfect agreement with the ordering operator.  There
are other parameters in the total cost estimate that will depend on the
index, but this one doesn't AFAICS.
        regards, tom lane


Re: RFC: planner statistics in 7.2

From
Peter Eisentraut
Date:
Tom Lane writes:

> 2. The M boundary values of an equi-depth histogram, ie, the values that
> divide the data distribution into equal-population sets.  For example, if
> M is 3 this would consist of the min, median, and max values.

Why not model that data in a normal distribution (or some other such
model)?  This should give you fairly good estimates for <, > and between
type queries.  That way the user wouldn't have to tweak M.  (I couldn't
even imagine a way to explain to the user how to pick good M's.)

> Issue: for datatypes that have no '<' operator, we of course cannot
> compute a histogram --- but if there is an '=' then the most-common-value
> and number-of-distinct-values stats still make sense.

I think the statistics calculation should be data type (or operator, or
opclass) specific, like the selectivity estimation functions.  For
geometric data types you need completely different kinds of statistics,
and this would allow users to plug in different methods.  The
pg_statistics table could just have an array of floats where each data
type can store statistics as it chooses and the selectivity estimation
routines can interpret the values in a different way per data type.  That
way you can also make some common sense optimization without hacks, e.g.,
for boolean columns you possibly only need to calculate 1 value (number of
trues).

Of course, how to calculate up to N different sets of statistics for N
different columns that require up to N different numbers of passes over
the table is left as a challenge.  ;-)

This brings up a question I have:  Are statistics calculated for every
column?  Should they be?  Is there a possibility to speed up ANALYZE by
controlling this?

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: RFC: planner statistics in 7.2

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane writes:
>> 2. The M boundary values of an equi-depth histogram, ie, the values that
>> divide the data distribution into equal-population sets.  For example, if
>> M is 3 this would consist of the min, median, and max values.

> Why not model that data in a normal distribution (or some other such
> model)?

If we knew an appropriate distribution model, we could do that.  There
is no distribution model that is appropriate for everything... certainly
the normal distribution is not.  A compressed histogram is more flexible
than any other simple model I know of.

> (I couldn't even imagine a way to explain to the user how to pick good
> M's.)

I was just planning to say "if you get bad estimates on a column with a
highly irregular distribution, try increasing the default M".
Eventually we might have enough experience with it to offer more
detailed advice.

>> Issue: for datatypes that have no '<' operator, we of course cannot
>> compute a histogram --- but if there is an '=' then the most-common-value
>> and number-of-distinct-values stats still make sense.

> I think the statistics calculation should be data type (or operator, or
> opclass) specific, like the selectivity estimation functions.

Yeah, that's probably what we should do, but so far no one has even
suggested what stats we might want for geometric datatypes.  Tell you
what: let's throw in a field that indicates "kind of statistics
gathered", and have the meaning of the array fields depend on that.
The proposal I gave describes just the stats to gather for scalar types.

> pg_statistics table could just have an array of floats where each data
> type can store statistics as it chooses and the selectivity estimation
> routines can interpret the values in a different way per data type.

Man does not live by floats alone --- in particular we need to be able
to store specific values of the target datatype.  Probably what we want
is three or four instances of the pattern

stats_kind    int,        -- identifies type of info
stats_numbers    float[],    -- numerical info such as occurrence fraction
stats_values    text[],        -- array of values of column datatype            -- (in external representation)

Depending on the value of stats_kind, either stats_numbers or
stats_values might be unused, in which case it could be set to NULL so
it doesn't waste space.  For scalar datatypes, an instance of this
pattern could hold the most common values and their frequencies, and
another one could hold the histogram boundary points (with stats_numbers
unused).

> Of course, how to calculate up to N different sets of statistics for N
> different columns that require up to N different numbers of passes over
> the table is left as a challenge.  ;-)

As long as it's OK to gather the stats from a random sample of the
table, I think the structure I proposed would work fine.  Remember the
second part of ANALYZE is looping over each column separately anyway ---
so it could easily apply a datatype-dependent algorithm for computing
statistics.

> This brings up a question I have:  Are statistics calculated for every
> column?  Should they be?  Is there a possibility to speed up ANALYZE by
> controlling this?

Presently, frequency stats will be calculated for every column that has
an '=' operator, and min/max stats will be calculated if there's a '<'
operator for the type.  I was planning to retain that convention.
However if we are going to allow adjustable M, we could also add the
stipulation that the DBA could set M = 0 for columns that he doesn't
want stats gathered for.  (This would make sense for a column that is
never used in a WHERE clause.)
        regards, tom lane


Re: RFC: planner statistics in 7.2

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Remember one idea is for index scans to automatically update the expired
> flag in the index bitfields when they check the heap tuple.

There is no such flag, and I'm not planning to add one before I fix
statistics ;-)

There are a number of things that'd have to be thought about before
such a thing could happen.  For one thing, trying to obtain write locks
instead of read locks during a btree scan would blow us out of the water
for supporting concurrent btree operations.
        regards, tom lane


Re: RFC: planner statistics in 7.2

From
Bruce Momjian
Date:
> A different approach that's been discussed on pghackers is to make use
> of btree indexes for columns that have such indexes: we could scan the
> indexes to visit all the column values in sorted order.  I have rejected
> that approach because (a) it doesn't help for columns without a suitable
> index; (b) our indexes don't distinguish deleted and live tuples,
> which would skew the statistics --- in particular, we couldn't tell a
> frequently-updated single tuple from a commonly repeated value; (c)
> scanning multiple indexes would likely require more total I/O than just
> grabbing sample tuples from the main table --- especially if we have to
> do that anyway to handle columns without indexes.

Remember one idea is for index scans to automatically update the expired
flag in the index bitfields when they check the heap tuple.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: RFC: planner statistics in 7.2

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Remember one idea is for index scans to automatically update the expired
> > flag in the index bitfields when they check the heap tuple.
> 
> There is no such flag, and I'm not planning to add one before I fix
> statistics ;-)
> 
> There are a number of things that'd have to be thought about before
> such a thing could happen.  For one thing, trying to obtain write locks
> instead of read locks during a btree scan would blow us out of the water
> for supporting concurrent btree operations.

Yes, it was just an idea.  Someone may add it for 7.2.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: RFC: planner statistics in 7.2

From
Philip Warner
Date:
At 22:27 19/04/01 -0400, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>> At 21:14 19/04/01 -0400, Tom Lane wrote:
>>> But you don't really need to look at the index (if it even exists
>>> at the time you do the ANALYZE).  The extent to which the data is
>>> ordered in the table is a property of the table, not the index.
>
>> But the value (and cost) of using a specific index in an indexscan depends
>> on that index (or am I missing something?). 
>
>All that we're discussing here is one specific parameter in the cost
>estimation for an indexscan, viz, the extent to which the table ordering
>agrees with the index ordering. 

This does not necessarily follow. A table ordering need not follow the sort
order of an index for the index to have a low indexscan cost. All that is
required is that most of the rows referred to by an index node must reside
in a page or pages that will be read by one IO. eg. a table that has a
sequence based ID, with, say 20% of rows updated, will work nicely with an
indexscan on the ID, even though it has never been clustered. 

What I'm suggesting is that if you look at a random sample of index nodes,
you should be able to get a statistically valid estimate of the 'clumping'
of the data pointed to by the index. 

Am I still missing the point?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: RFC: planner statistics in 7.2

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
>> All that we're discussing here is one specific parameter in the cost
>> estimation for an indexscan, viz, the extent to which the table ordering
>> agrees with the index ordering. 

> This does not necessarily follow. A table ordering need not follow the sort
> order of an index for the index to have a low indexscan cost. All that is
> required is that most of the rows referred to by an index node must reside
> in a page or pages that will be read by one IO. eg. a table that has a
> sequence based ID, with, say 20% of rows updated, will work nicely with an
> indexscan on the ID, even though it has never been clustered. 

Right, what matters is the extent of correlation between table ordering
and index ordering, not how it got to be that way.

> What I'm suggesting is that if you look at a random sample of index nodes,
> you should be able to get a statistically valid estimate of the 'clumping'
> of the data pointed to by the index. 

And I'm saying that you don't actually have to look at the index in
order to compute the very same estimate.  The only property of the index
that matters is its sort order; if you assume you know the right sort
order (and in practice there's usually only one interesting possibility
for a column) then you can compute the correlation just by looking at
the table.

Andreas correctly points out that this approach doesn't extend very well
to multi-column or functional indexes, but I'm willing to punt on those
for the time being ...
        regards, tom lane


Re: RFC: planner statistics in 7.2

From
Hannu Krosing
Date:
Tom Lane wrote:

> Philip Warner <pjw@rhyme.com.au> writes:
> 
>> What I'm suggesting is that if you look at a random sample of index nodes,
>> you should be able to get a statistically valid estimate of the 'clumping'
>> of the data pointed to by the index. 
> 
> 
> And I'm saying that you don't actually have to look at the index in
> order to compute the very same estimate.  The only property of the index
> that matters is its sort order; if you assume you know the right sort
> order (and in practice there's usually only one interesting possibility
> for a column) then you can compute the correlation just by looking at
> the table.

This is more true for unique indexes than for non-unique ones unless 
our non-unique indexes are smart enough to insert equal index nodes in 
table order .

> Andreas correctly points out that this approach doesn't extend very well
> to multi-column or functional indexes, but I'm willing to punt on those
> for the time being ...

----------
Hannu



Re: RFC: planner statistics in 7.2

From
Philip Warner
Date:
At 10:10 23/04/01 -0400, Tom Lane wrote:
>>> All that we're discussing here is one specific parameter in the cost
>>> estimation for an indexscan, viz, the extent to which the table ordering
>>> agrees with the index ordering. 
>
>> This does not necessarily follow. A table ordering need not follow the sort
>> order of an index for the index to have a low indexscan cost. All that is
>> required is that most of the rows referred to by an index node must reside
>> in a page or pages that will be read by one IO. eg. a table that has a
>> sequence based ID, with, say 20% of rows updated, will work nicely with an
>> indexscan on the ID, even though it has never been clustered. 
>
>Right, what matters is the extent of correlation between table ordering
>and index ordering, not how it got to be that way.

No; *local* ordering needs to *roughly* match. Global ordering and
record-by-record ordering don't matter.

For example, for a table with an ID field, the rows may be stored as (where
--- indicates a mythical page)

-----
5
9
6
7
-----
1
10
2
3
-----
4
8
11
12
-----

A sorted index may have nodes pointers to (1,2,3), (4,5,6), (7,8,9) and
(10,11,12). The first node would take 1 IO, then each of the others would
take 2. This would give a much more reasonable estimate for the indexscan
costs (assuming a random sample is adequate).


>> What I'm suggesting is that if you look at a random sample of index nodes,
>> you should be able to get a statistically valid estimate of the 'clumping'
>> of the data pointed to by the index. 
>
>And I'm saying that you don't actually have to look at the index in
>order to compute the very same estimate. 

No. Not given the above.


>The only property of the index
>that matters is its sort order; if you assume you know the right sort
>order (and in practice there's usually only one interesting possibility
>for a column) then you can compute the correlation just by looking at
>the table.

This is true, but only if you are strictly interested in sort order, which
I don't think we are.


>Andreas correctly points out that this approach doesn't extend very well
>to multi-column or functional indexes, but I'm willing to punt on those
>for the time being ...

My approach should work with arbitrary indexes.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: RFC: planner statistics in 7.2

From
Jan Wieck
Date:
Bruce Momjian wrote:
> > A different approach that's been discussed on pghackers is to make use
> > of btree indexes for columns that have such indexes: we could scan the
> > indexes to visit all the column values in sorted order.  I have rejected
> > that approach because (a) it doesn't help for columns without a suitable
> > index; (b) our indexes don't distinguish deleted and live tuples,
> > which would skew the statistics --- in particular, we couldn't tell a
> > frequently-updated single tuple from a commonly repeated value; (c)
> > scanning multiple indexes would likely require more total I/O than just
> > grabbing sample tuples from the main table --- especially if we have to
> > do that anyway to handle columns without indexes.
>
> Remember one idea is for index scans to automatically update the expired
> flag in the index bitfields when they check the heap tuple.
   And  we  should  really do that. While playing around with my   (for 7.2 to be) access statistics stuff  I  found
that when   running  pg_bench,  a  couple  of  thousand index scans cause   millions  and  millions  of  buffer
fetches, because   that   pg_bench updates one and the same row over and over again and   it has a PKEY.
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com