Re: RFC: planner statistics in 7.2 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: RFC: planner statistics in 7.2
Date
Msg-id 12651.988035033@sss.pgh.pa.us
Whole thread Raw
In response to Re: RFC: planner statistics in 7.2  (Philip Warner <pjw@rhyme.com.au>)
Responses Re: RFC: planner statistics in 7.2  (Philip Warner <pjw@rhyme.com.au>)
List pgsql-hackers
Philip Warner <pjw@rhyme.com.au> writes:
>> All that we're discussing here is one specific parameter in the cost
>> estimation for an indexscan, viz, the extent to which the table ordering
>> agrees with the index ordering. 

> This does not necessarily follow. A table ordering need not follow the sort
> order of an index for the index to have a low indexscan cost. All that is
> required is that most of the rows referred to by an index node must reside
> in a page or pages that will be read by one IO. eg. a table that has a
> sequence based ID, with, say 20% of rows updated, will work nicely with an
> indexscan on the ID, even though it has never been clustered. 

Right, what matters is the extent of correlation between table ordering
and index ordering, not how it got to be that way.

> What I'm suggesting is that if you look at a random sample of index nodes,
> you should be able to get a statistically valid estimate of the 'clumping'
> of the data pointed to by the index. 

And I'm saying that you don't actually have to look at the index in
order to compute the very same estimate.  The only property of the index
that matters is its sort order; if you assume you know the right sort
order (and in practice there's usually only one interesting possibility
for a column) then you can compute the correlation just by looking at
the table.

Andreas correctly points out that this approach doesn't extend very well
to multi-column or functional indexes, but I'm willing to punt on those
for the time being ...
        regards, tom lane


pgsql-hackers by date:

Previous
From: Adam Rose
Date:
Subject: row name length
Next
From: Peter Eisentraut
Date:
Subject: Re: Re: How to determine if a user exists..