Re: RFC: planner statistics in 7.2 - Mailing list pgsql-hackers
From | Philip Warner |
---|---|
Subject | Re: RFC: planner statistics in 7.2 |
Date | |
Msg-id | 3.0.5.32.20010424015647.03ca35e0@mail.rhyme.com.au Whole thread Raw |
In response to | Re: RFC: planner statistics in 7.2 (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
At 10:10 23/04/01 -0400, Tom Lane wrote: >>> All that we're discussing here is one specific parameter in the cost >>> estimation for an indexscan, viz, the extent to which the table ordering >>> agrees with the index ordering. > >> This does not necessarily follow. A table ordering need not follow the sort >> order of an index for the index to have a low indexscan cost. All that is >> required is that most of the rows referred to by an index node must reside >> in a page or pages that will be read by one IO. eg. a table that has a >> sequence based ID, with, say 20% of rows updated, will work nicely with an >> indexscan on the ID, even though it has never been clustered. > >Right, what matters is the extent of correlation between table ordering >and index ordering, not how it got to be that way. No; *local* ordering needs to *roughly* match. Global ordering and record-by-record ordering don't matter. For example, for a table with an ID field, the rows may be stored as (where --- indicates a mythical page) ----- 5 9 6 7 ----- 1 10 2 3 ----- 4 8 11 12 ----- A sorted index may have nodes pointers to (1,2,3), (4,5,6), (7,8,9) and (10,11,12). The first node would take 1 IO, then each of the others would take 2. This would give a much more reasonable estimate for the indexscan costs (assuming a random sample is adequate). >> What I'm suggesting is that if you look at a random sample of index nodes, >> you should be able to get a statistically valid estimate of the 'clumping' >> of the data pointed to by the index. > >And I'm saying that you don't actually have to look at the index in >order to compute the very same estimate. No. Not given the above. >The only property of the index >that matters is its sort order; if you assume you know the right sort >order (and in practice there's usually only one interesting possibility >for a column) then you can compute the correlation just by looking at >the table. This is true, but only if you are strictly interested in sort order, which I don't think we are. >Andreas correctly points out that this approach doesn't extend very well >to multi-column or functional indexes, but I'm willing to punt on those >for the time being ... My approach should work with arbitrary indexes. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
pgsql-hackers by date: