Home > mailing lists

Re: Does "correlation" mislead the optimizer on large tables? - Mailing list pgsql-performance

From	Tom Lane
Subject	Re: Does "correlation" mislead the optimizer on large tables?
Date	January 24, 2003 02:35:52
Msg-id	11025.1043390899@sss.pgh.pa.us Whole thread Raw
In response to	Does "correlation" mislead the optimizer on large tables? (Ron Mayer <ron@intervideo.com>)
Responses	Re: Does "correlation" mislead the optimizer on large Re: Does "correlation" mislead the optimizer on large
List	pgsql-performance

Tree view

Ron Mayer <ron@intervideo.com> writes:
>   On a large tables, I think the "correlation" pg_stats field as calculated
>   by "vacuum analyze" or "analyze" can mislead the optimizer.

If you look in the pghackers archives, you will find some discussion
about changing the equation that cost_index() uses to estimate the
impact of correlation on indexscan cost.  The existing equation is
ad-hoc and surely wrong, but so far no one's proposed a replacement
that can be justified any better.  If you've got such a replacement
then we're all ears...

>   In particular, if I have a large table t with columns 'a','b','c', etc,
>   and I cluster the table as follows:
>     create table t_ordered as select * from t order by a,b;
>     vacuum analyze t_ordered;
>   Column "b" will (correctly) get a very low "correlation" in
>   the pg_stats table -- but I think the optimizer would do better
>   assuming a high correlation because similar 'b' values are still
>   grouped closely on the same disk pages.

How would that be?  They'll be separated by the stride of 'a'.

It seems likely to me that a one-dimensional correlation statistic may
be inadequate, but I haven't seen any proposals for better stats.

            regards, tom lane

pgsql-performance by date:

From: Tom Lane
Date: 24 January 2003, 02:08:59
Subject: Re: Crash Recovery

From: Andrew Sullivan
Date: 24 January 2003, 08:22:20
Subject: Re: Crash Recovery

Re: Does "correlation" mislead the optimizer on large tables? - Mailing list pgsql-performance

Previous

Next