At 18:37 19/04/01 -0400, Tom Lane wrote:
>(2) Statistics should be computed on the basis of a random sample of the
>target table, rather than a complete scan. According to the literature
>I've looked at, sampling a few thousand tuples is sufficient to give good
>statistics even for extremely large tables; so it should be possible to
>run ANALYZE in a short amount of time regardless of the table size.
This sounds great; can the same be done for clustering. ie. pick a random
sample of index nodes, look at the record pointers and so determine how
well clustered the table is?
>A simple approach would be a SET
>variable or explicit parameter for ANALYZE. But I am inclined to think
>that it'd be better to create a persistent per-column state for this,
>set by say
> ALTER TABLE tab SET COLUMN col STATS COUNT n
Sounds fine - user-selectability at the column level seems a good idea.
Would there be any value in not making it part of a normal SQLxx statement,
and adding an 'ALTER STATISTICS' command? eg.
ALTER STATISTICS FOR tab[.column] COLLECT n ALTER STATISTICS FOR tab SAMPLE m
etc.
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \| | --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/