Home > mailing lists

Re: new correlation metric - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: new correlation metric
Date	October 26, 2008 13:44:13
Msg-id	11984.1225039446@sss.pgh.pa.us Whole thread Raw
In response to	Re: new correlation metric (Martijn van Oosterhout <kleptog@svana.org>)
Responses	Re: new correlation metric
List	pgsql-hackers

Tree view

Martijn van Oosterhout <kleptog@svana.org> writes:
> I think the code is in the right direction, but I think want you want
> is some kind of estimate of "given I've looked for tuple X, how many
> tuples in the next k pages are near this one". Unfortunatly I don't see
> a way of calculating it other than a full simulation.

I wonder whether we ought to rethink the problem entirely.  In
particular, the notion of associating correlation stats with particular
columns doesn't seem amazingly useful when you get right down to it.
We're wasting our time calculating the correlation of a column that has
no index; and if the column is part of a multicolumn index it's not that
easy to figure out what the index correlation is from the per-column
numbers.  Not to mention functional and partial indexes.

So it occurs to me that maybe we should forget about per-column
correlations altogether, and instead try directly to calculate *per
index* correlations.  You could do this now by doing an index-only scan
and looking at the series of tuple block numbers that come back.
However, there's no obvious way to do any sampling in that approach
--- you can't read random subsets of the index without cooperation from
the index AM.

So the direction I'm headed in is to imagine that we should add an
"analyze" entry point to the index AM API, with the definition that it
will be called once per index per ANALYZE, and is in charge of putting
useful stats into someplace or other.  We might need to invent some
other catalog besides pg_statistic if we want to represent per-index
properties like correlation.  A minimal solution would be to add a
correlation column to pg_class or pg_index, but that doesn't scale well
if you imagine that different index AMs might want different sorts of
stats.
        regards, tom lane

pgsql-hackers by date:

From: Greg Stark
Date: 26 October 2008, 13:03:43
Subject: Re: new correlation metric

From: Jeff Davis
Date: 26 October 2008, 15:32:24
Subject: array_agg and array_accum (patch)

Re: new correlation metric - Mailing list pgsql-hackers

Previous

Next