Home > mailing lists

Re: Cross-column statistics revisited - Mailing list pgsql-hackers

From	Martijn van Oosterhout
Subject	Re: Cross-column statistics revisited
Date	October 17, 2008 03:24:37
Msg-id	20081017062421.GA1443@svana.org Whole thread Raw
In response to	Re: Cross-column statistics revisited ("Joshua Tolley" <eggyknap@gmail.com>)
Responses	Re: Cross-column statistics revisited
List	pgsql-hackers

Tree view

On Thu, Oct 16, 2008 at 09:17:03PM -0600, Joshua Tolley wrote:
> Because I'm trying to picture geometrically how this might work for
> the two-column case, and hoping to extend that to more dimensions, and
> am finding that picturing a quantile-based system like the one we have
> now in multiple dimensions is difficult.

Just a note: using a multidimensional histograms will work well for the
cases like (startdate,enddate) where the histogram will show a
clustering of values along the diagonal. But it will fail for the case
(zipcode,state) where one implies the other. Histogram-wise you're not
going to see any correlation at all but what you want to know is:

count(distinct zipcode,state) = count(distinct zipcode)

So you might need to think about storing/searching for different kinds
of correlation.

Secondly, my feeling about multidimensional histograms is that you're
not going to need the matrix to have 100 bins along each axis, but that
it'll be enough to have 1000 bins total. The cases where we get it
wrong enough for people to notice will probably be the same cases where
the histogram will have noticable variation even for a small number of
bins.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

pgsql-hackers by date:

From: "Joshua Tolley"
Date: 17 October 2008, 00:17:08
Subject: Re: Cross-column statistics revisited

From: Martijn van Oosterhout
Date: 17 October 2008, 03:42:05
Subject: Re: Cross-column statistics revisited

Re: Cross-column statistics revisited - Mailing list pgsql-hackers

Previous

Next