Home > mailing lists

Re: Stats for multi-column indexes - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: Stats for multi-column indexes
Date	March 19, 2007 22:56:09
Msg-id	1174355756.23455.470.camel@dogma.v10.wvs Whole thread Raw
In response to	Re: Stats for multi-column indexes (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Stats for multi-column indexes Re: Stats for multi-column indexes
List	pgsql-hackers

Tree view

On Mon, 2007-03-19 at 21:24 -0400, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> > We can already keep stats for a functional index. Is there a reason we
> > can't keep stats for a multi-column index?
> 
> The questions that need to be answered are (1) what stats are you gonna
> collect, and (2) exactly what are you going to do with them when you
> have 'em?
> 
> All the previous discussions have stalled on the question of how to
> avoid trying to collect stats about an exponentially large number of
> column combinations; we've never even reached the question of what
> stats we'd actually want given that a particular combination has been
> determined to be interesting.  Perhaps that's a trivial question,
> but it's been a mighty long time since I took statistics ...
> 

I know we can't keep stats on every combination of columns. My initial
idea would be to only keep stats about a multi-column index (and
probably optional for those, too).

My thinking was that we could keep a histogram (and MCVs, etc.) of the
non-scalar key in the multi-column index. That would provide the data
the planner needs to answer a query like "WHERE a = 1 and b < 1000" if a
and b are dependent and you have an index on (a,b).

It seemed within reach to me initially because I could use a functional
index (in which the function turns multiple values into a comparable
scalar) and postgresql would index that and keep stats. And when it has
those stats, it makes the correct plan. Of course, I have to litter the
SQL with unnecessary function calls (so that it can use the functional
index), which makes this undesirable.

AndrewSN pointed out on IRC that keeping a histogram of non-scalar
values is not as easy as I thought, because PostgreSQL doesn't allow
arrays of composite types, among other problems.

Is this a worthwhile area of exploration?

Regards,Jeff Davis

pgsql-hackers by date:

From: Andrew Dunstan
Date: 19 March 2007, 22:35:52
Subject: Re: buildfarm not reporting xml flag correctly?

From: Mark Kirkwood
Date: 19 March 2007, 23:14:54
Subject: Re: Stats for multi-column indexes

Re: Stats for multi-column indexes - Mailing list pgsql-hackers

Previous

Next