Re: Stats for multi-column indexes - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Stats for multi-column indexes
Date
Msg-id 1174355756.23455.470.camel@dogma.v10.wvs
Whole thread Raw
In response to Re: Stats for multi-column indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Stats for multi-column indexes  (Mark Kirkwood <markir@paradise.net.nz>)
Re: Stats for multi-column indexes  ("Jim C. Nasby" <jim@nasby.net>)
List pgsql-hackers
On Mon, 2007-03-19 at 21:24 -0400, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> > We can already keep stats for a functional index. Is there a reason we
> > can't keep stats for a multi-column index?
> 
> The questions that need to be answered are (1) what stats are you gonna
> collect, and (2) exactly what are you going to do with them when you
> have 'em?
> 
> All the previous discussions have stalled on the question of how to
> avoid trying to collect stats about an exponentially large number of
> column combinations; we've never even reached the question of what
> stats we'd actually want given that a particular combination has been
> determined to be interesting.  Perhaps that's a trivial question,
> but it's been a mighty long time since I took statistics ...
> 

I know we can't keep stats on every combination of columns. My initial
idea would be to only keep stats about a multi-column index (and
probably optional for those, too).

My thinking was that we could keep a histogram (and MCVs, etc.) of the
non-scalar key in the multi-column index. That would provide the data
the planner needs to answer a query like "WHERE a = 1 and b < 1000" if a
and b are dependent and you have an index on (a,b).

It seemed within reach to me initially because I could use a functional
index (in which the function turns multiple values into a comparable
scalar) and postgresql would index that and keep stats. And when it has
those stats, it makes the correct plan. Of course, I have to litter the
SQL with unnecessary function calls (so that it can use the functional
index), which makes this undesirable.

AndrewSN pointed out on IRC that keeping a histogram of non-scalar
values is not as easy as I thought, because PostgreSQL doesn't allow
arrays of composite types, among other problems.

Is this a worthwhile area of exploration?

Regards,Jeff Davis



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: buildfarm not reporting xml flag correctly?
Next
From: Mark Kirkwood
Date:
Subject: Re: Stats for multi-column indexes