Home > mailing lists

Re: RFC: planner statistics in 7.2 - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: RFC: planner statistics in 7.2
Date	April 19, 2001 20:59:48
Msg-id	23581.987727737@sss.pgh.pa.us Whole thread Raw
In response to	RFC: planner statistics in 7.2 (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: RFC: planner statistics in 7.2
List	pgsql-hackers

Tree view

Philip Warner <pjw@rhyme.com.au> writes:
> At 18:37 19/04/01 -0400, Tom Lane wrote:
>> (2) Statistics should be computed on the basis of a random sample of the
>> target table, rather than a complete scan.  According to the literature
>> I've looked at, sampling a few thousand tuples is sufficient to give good
>> statistics even for extremely large tables; so it should be possible to
>> run ANALYZE in a short amount of time regardless of the table size.

> This sounds great; can the same be done for clustering. ie. pick a random
> sample of index nodes, look at the record pointers and so determine how
> well clustered the table is?

My intention was to use the same tuples sampled for the data histograms
to estimate how well sorted the data is.  However it's not immediately
clear that that'll give a trustworthy estimate; I'm still studying it ...

>> ALTER TABLE tab SET COLUMN col STATS COUNT n

> Sounds fine - user-selectability at the column level seems a good idea.
> Would there be any value in not making it part of a normal SQLxx statement,
> and adding an 'ALTER STATISTICS' command? eg. 

>     ALTER STATISTICS FOR tab[.column] COLLECT n
>     ALTER STATISTICS FOR tab SAMPLE m

Is that more standard than the other syntax?
        regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Date: 19 April 2001, 19:10:41
Subject: Re: RFC: planner statistics in 7.2y

From: Philip Warner
Date: 19 April 2001, 21:01:09
Subject: Re: RFC: planner statistics in 7.2

Re: RFC: planner statistics in 7.2 - Mailing list pgsql-hackers

Previous

Next