Re: Better estimates of index correlation - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Better estimates of index correlation
Date
Msg-id AANLkTikXkDNtwd4vrHMV0zemUP-H+dPH=+OQQWUARSdP@mail.gmail.com
Whole thread Raw
In response to Re: Better estimates of index correlation  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Better estimates of index correlation
List pgsql-hackers
On Tue, Mar 15, 2011 at 12:27 AM, Josh Berkus <josh@agliodbs.com> wrote:
> Note that if this requires VACUUM rather than ANALYZE, it introduces a
> problem for data warehousing users, who can go years between vacuums of
> their largest tables.

I don't understand, are they going years between vacuums because their
data is static? In which case the index correlation won't change. Or
is it append-only, in which case I suspect the newly appended data is
likely to have the same correlation as the old data. But is there
anything stopping us from doing some sort of ANALYZE-style sample of
the index pages as well?

I think the bigger problems here are that a) correlation isn't
actually a useful statistic for estimating random seeks and b) I'm not
sure what counting back-transitions has to do with correlation.

If we're lucky it may be that counting back-transitions is a more
useful stat than correlation anyways. It does seem to have more to do
with random seeks than correlation. It might need some refinement
though or some other metrics to go along with it to get a real basis
for an estimate of random seeks though. I'm wondering about how far
the back-transitions and forward transitions actually go. If you're
skipping every other block that's twice as much i/o as reading every
block, but if you skip n blocks where n >
random_page_cost/seq_page_cost then that's one random read per block.
If you skip 1 block backwards to a block you've already read then
that's free, but if you skip backwards to a block that isn't recently
referenced that's a random seek.

There are also niggling questions about taking statistics based on
tuples that are already dead or were never committed.

-- 
greg


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: dependency between numbers keywords and parser speed
Next
From: Josh Berkus
Date:
Subject: Re: Better estimates of index correlation