Re: RFC: planner statistics in 7.2 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: RFC: planner statistics in 7.2
Date
Msg-id 23581.987727737@sss.pgh.pa.us
Whole thread Raw
In response to RFC: planner statistics in 7.2  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: RFC: planner statistics in 7.2  (Philip Warner <pjw@rhyme.com.au>)
List pgsql-hackers
Philip Warner <pjw@rhyme.com.au> writes:
> At 18:37 19/04/01 -0400, Tom Lane wrote:
>> (2) Statistics should be computed on the basis of a random sample of the
>> target table, rather than a complete scan.  According to the literature
>> I've looked at, sampling a few thousand tuples is sufficient to give good
>> statistics even for extremely large tables; so it should be possible to
>> run ANALYZE in a short amount of time regardless of the table size.

> This sounds great; can the same be done for clustering. ie. pick a random
> sample of index nodes, look at the record pointers and so determine how
> well clustered the table is?

My intention was to use the same tuples sampled for the data histograms
to estimate how well sorted the data is.  However it's not immediately
clear that that'll give a trustworthy estimate; I'm still studying it ...

>> ALTER TABLE tab SET COLUMN col STATS COUNT n

> Sounds fine - user-selectability at the column level seems a good idea.
> Would there be any value in not making it part of a normal SQLxx statement,
> and adding an 'ALTER STATISTICS' command? eg. 

>     ALTER STATISTICS FOR tab[.column] COLLECT n
>     ALTER STATISTICS FOR tab SAMPLE m

Is that more standard than the other syntax?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: RFC: planner statistics in 7.2y
Next
From: Philip Warner
Date:
Subject: Re: RFC: planner statistics in 7.2