Re: Statistics and selectivity estimation for ranges - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Statistics and selectivity estimation for ranges
Date
Msg-id 50338C2E.5060908@enterprisedb.com
Whole thread Raw
In response to Re: Statistics and selectivity estimation for ranges  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
On 20.08.2012 00:31, Alexander Korotkov wrote:
> On Thu, Aug 16, 2012 at 4:40 PM, Heikki Linnakangas<
> heikki.linnakangas@enterprisedb.com>  wrote:
>
>> On 15.08.2012 11:34, Alexander Korotkov wrote:
>>
>>> Ok, we've to decide if we need "standard" histogram. In some cases it can
>>> be used for more accurate estimation of<   and>   operators.
>>> But I think it is not so important. So, we can replace "standard"
>>> histogram
>>> with histograms of lower and upper bounds?
>>
>> Yeah, I think that makes more sense. The lower bound histogram is still
>> useful for<  and>  operators, just not as accurate if there are lots of
>> values with the same lower bound but different upper bound.
>
> New version of patch.
> * Collect new stakind STATISTIC_KIND_BOUNDS_HISTOGRAM, which is lower and
> upper bounds histograms combined into single ranges array, instead
> of STATISTIC_KIND_HISTOGRAM.

Ah, that's an interesting approach. So essentially, the histogram looks 
just like a normal STATISTIC_KIND_HISTOGRAM histogram, but the values 
stored in it are not picked the usual way. The usual way would be to 
pick N evenly-spaced values from the column, and store those. Instead, 
you pick N evenly-spaced lower bounds, and N evenly-spaced upper bounds, 
and construct N range values from those. Looking at a single value in 
the histogram, its lower bound comes from a different row than its upper 
bound.

That's pretty clever - the histogram has a shape and order that's 
compatible with a histogram you'd get with the standard scalar 
typanalyze function. In fact, I think you could just let the standard 
scalar estimators for < and > to use that histogram as is. Perhaps we 
should use STATISTIC_KIND_HISTOGRAM for this after all...

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: multi-master pgbench?
Next
From: Sachin Srivastava
Date:
Subject: PostgreSQL 9.2beta4 (& git HEAD) server crash on creating extension plpython3u