Home > mailing lists

Re: Correlation in cost_index() - Mailing list pgsql-hackers

From	Zeugswetter Andreas SB SD
Subject	Re: Correlation in cost_index()
Date	August 11, 2003 10:10:20
Msg-id	46C15C39FEB2C44BA555E356FBCD6FA4961FEC@m0114.s-mxs.net Whole thread Raw
In response to	Correlation in cost_index() (Manfred Koizar <mkoi-pg@aon.at>)
List	pgsql-hackers

Tree view

> In both cases ANALYZE will calculate correlation 1.0 for column X,
> and something near zero for column Y.  We would like to come out with
> index correlation 1.0 for the left-hand case and something much less
> (but, perhaps, not zero) for the right-hand case.  I don't really see
> a way to do this without actually examining the multi-column ordering
> relationship during ANALYZE.

The only way the second column correlation will be irrelevant is if
the first column is already (nearly) unique (enough so, that the second
column wont scatter fetches enough to fill the buffer before seeing cache hits).
Thus I think when merging correlations you could take nunique into account.

corr = corr_1 * (corr_2 * ( 1 - nunique_1 / nrows))
But, I think one (new) correlation metric for the whole index (whole key) and the
data pages would actually be sufficient. This metric could imho always be used
instead of the per column correlations to calculate index cost. This holds true
as long as you walk an index range, and that is what it is all about, no ?

???
Andreas

pgsql-hackers by date:

From: The Hermit Hacker
Date: 11 August 2003, 10:10:06
Subject: Re: Farewell

From: "Dave Page"
Date: 11 August 2003, 10:21:32
Subject: Re: Farewell

Re: Correlation in cost_index() - Mailing list pgsql-hackers

Previous

Next