Home > mailing lists

Re: Correlation in cost_index() - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Correlation in cost_index()
Date	August 8, 2003 22:26:00
Msg-id	1418.1060381541@sss.pgh.pa.us Whole thread Raw
In response to	Re: Correlation in cost_index() (Sean Chittenden <sean@chittenden.org>)
Responses	Re: Correlation in cost_index()
List	pgsql-hackers

Tree view

Sean Chittenden <sean@chittenden.org> writes:
> indexCorrelation is 1.0 for the 1st key in a multi-column index.

... only if it's perfectly correlated.

> As things stand, however, if a multi-column key is
> used, the indexCorrelation is penalized by the size of the number of
> keys found in the multi-column index.  As things stand the qual
> user_id = 42, on a CLUSTER'ed multi-column index (user_id,utc_date)
> has an indexCorrelation of 0.5, when in fact the correlation is 1.0.

Right, in the perfectly-correlated case this calculation is clearly
wrong.  However, what of cases where the first column shows good
correlation with the physical ordering, but the second does not?

The nasty part of this is that the correlation stat that ANALYZE
computed for the second column is of no value to us.  Two examples:
X    Y                X    Y
A    A                A    BA    B                A    CA    C                A    AB    A                B    AB    B
             B    CB    C                B    BC    A                C    CC    B                C    AC    C
    C    B

In both cases ANALYZE will calculate correlation 1.0 for column X,
and something near zero for column Y.  We would like to come out with
index correlation 1.0 for the left-hand case and something much less
(but, perhaps, not zero) for the right-hand case.  I don't really see
a way to do this without actually examining the multi-column ordering
relationship during ANALYZE.

> I tossed a different index on my test table to see how well things
> fare with a low correlation, and this was a bit disturbing:

Seems like most of the error in that estimate has to do with the poor
rowcount estimation.  There's very little percentage in trying to
analyze the effect of index correlation in examples where we don't have
the first-order stats correct ...
        regards, tom lane

pgsql-hackers by date:

From: Sean Chittenden
Date: 08 August 2003, 22:10:18
Subject: Re: Correlation in cost_index()

From: Manfred Koizar
Date: 08 August 2003, 23:18:43
Subject: Re: Correlation in cost_index()

Re: Correlation in cost_index() - Mailing list pgsql-hackers

Previous

Next