Re: Correlation in cost_index() - Mailing list pgsql-hackers

From Zeugswetter Andreas SB SD
Subject Re: Correlation in cost_index()
Date
Msg-id 46C15C39FEB2C44BA555E356FBCD6FA4961FEC@m0114.s-mxs.net
Whole thread Raw
In response to Correlation in cost_index()  (Manfred Koizar <mkoi-pg@aon.at>)
List pgsql-hackers
> In both cases ANALYZE will calculate correlation 1.0 for column X,
> and something near zero for column Y.  We would like to come out with
> index correlation 1.0 for the left-hand case and something much less
> (but, perhaps, not zero) for the right-hand case.  I don't really see
> a way to do this without actually examining the multi-column ordering
> relationship during ANALYZE.

The only way the second column correlation will be irrelevant is if
the first column is already (nearly) unique (enough so, that the second
column wont scatter fetches enough to fill the buffer before seeing cache hits).
Thus I think when merging correlations you could take nunique into account.

corr = corr_1 * (corr_2 * ( 1 - nunique_1 / nrows))
But, I think one (new) correlation metric for the whole index (whole key) and the
data pages would actually be sufficient. This metric could imho always be used
instead of the per column correlations to calculate index cost. This holds true
as long as you walk an index range, and that is what it is all about, no ?

???
Andreas


pgsql-hackers by date:

Previous
From: The Hermit Hacker
Date:
Subject: Re: Farewell
Next
From: "Dave Page"
Date:
Subject: Re: Farewell