Thread: AW: question about index cost estimates

AW: question about index cost estimates

From
Zeugswetter Andreas SB
Date:
> It seemed to me that the critical ratios are #tuples fetched vs #pages
> in table and table size vs. cache size.  I could be wrong though...

Another metric that would be interesing is a number that represents 
the divergence of the heap order from the index sort order.

If heap data is more or less in the same order as the index 
(e.g. cluster index) the table size is irrelevant.
So the measure would be something like 
clustered*tablesize vs cache size instead of only tablesize.

Andreas


Re: AW: question about index cost estimates

From
Tom Lane
Date:
Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:
> Another metric that would be interesing is a number that represents 
> the divergence of the heap order from the index sort order.

Oh, absolutely!  Got any ideas about how to get that number in a
reasonable amount of time (not too much more than VACUUM takes now)?
I've been drawing a blank :-(
        regards, tom lane


RE: AW: question about index cost estimates

From
Hiroshi Inoue
Date:
Tom Lane wrote:
>
> Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:
> > Another metric that would be interesing is a number that represents
> > the divergence of the heap order from the index sort order.
>
> Oh, absolutely!  Got any ideas about how to get that number in a
> reasonable amount of time (not too much more than VACUUM takes now)?
> I've been drawing a blank :-(
>

It seems hard to find the practical theory.
So how about a kind of sample scanning in a separate ANALYZE
command ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


Re: AW: question about index cost estimates

From
Tom Lane
Date:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> So how about a kind of sample scanning in a separate ANALYZE
> command ?

I do think it'd be a good idea to separate out the stats-gathering
into an ANALYZE command that can be invoked separately from VACUUM.
For one thing, there's no good reason to hold down an exclusive
lock on the target table while we are gathering stats.

But that doesn't answer the question: how can we measure the extent
to which a table is in the same order as an index?  And even that
is too one-dimensional a concept to apply to r-tree indexes, for
example.  What do we do for r-trees?
        regards, tom lane


Re: AW: question about index cost estimates

From
Peter Eisentraut
Date:
Tom Lane writes:

> > Another metric that would be interesing is a number that represents 
> > the divergence of the heap order from the index sort order.

> Got any ideas about how to get that number in a reasonable amount of
> time (not too much more than VACUUM takes now)?

There are a few fairly simple-minded methods to determine the sortedness
of a list. Any more sophisticated methods are probably not much better in
practice and take too much to calculate.

1. The number of items out of place

2. The number of pairs out of order

3. The number of adjacent pairs out of order.

4. Length of list minus length of longest increasing (not necessarily
adjacent) subsequence

For our application I would suggest 3. because it can be calculated in
linear time and it only gives points to situations that you really want,
namely sequential disk access. You could take score/(length-1) to get 1
for a really unsorted list and 0 for a completely sorted one.


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden