Thread: AW: question about index cost estimates
> It seemed to me that the critical ratios are #tuples fetched vs #pages > in table and table size vs. cache size. I could be wrong though... Another metric that would be interesing is a number that represents the divergence of the heap order from the index sort order. If heap data is more or less in the same order as the index (e.g. cluster index) the table size is irrelevant. So the measure would be something like clustered*tablesize vs cache size instead of only tablesize. Andreas
Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes: > Another metric that would be interesing is a number that represents > the divergence of the heap order from the index sort order. Oh, absolutely! Got any ideas about how to get that number in a reasonable amount of time (not too much more than VACUUM takes now)? I've been drawing a blank :-( regards, tom lane
Tom Lane wrote: > > Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes: > > Another metric that would be interesing is a number that represents > > the divergence of the heap order from the index sort order. > > Oh, absolutely! Got any ideas about how to get that number in a > reasonable amount of time (not too much more than VACUUM takes now)? > I've been drawing a blank :-( > It seems hard to find the practical theory. So how about a kind of sample scanning in a separate ANALYZE command ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
Hiroshi Inoue <Inoue@tpf.co.jp> writes: > So how about a kind of sample scanning in a separate ANALYZE > command ? I do think it'd be a good idea to separate out the stats-gathering into an ANALYZE command that can be invoked separately from VACUUM. For one thing, there's no good reason to hold down an exclusive lock on the target table while we are gathering stats. But that doesn't answer the question: how can we measure the extent to which a table is in the same order as an index? And even that is too one-dimensional a concept to apply to r-tree indexes, for example. What do we do for r-trees? regards, tom lane
Tom Lane writes: > > Another metric that would be interesing is a number that represents > > the divergence of the heap order from the index sort order. > Got any ideas about how to get that number in a reasonable amount of > time (not too much more than VACUUM takes now)? There are a few fairly simple-minded methods to determine the sortedness of a list. Any more sophisticated methods are probably not much better in practice and take too much to calculate. 1. The number of items out of place 2. The number of pairs out of order 3. The number of adjacent pairs out of order. 4. Length of list minus length of longest increasing (not necessarily adjacent) subsequence For our application I would suggest 3. because it can be calculated in linear time and it only gives points to situations that you really want, namely sequential disk access. You could take score/(length-1) to get 1 for a really unsorted list and 0 for a completely sorted one. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden