Home > mailing lists

Re: Statistics and selectivity estimation for ranges - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Statistics and selectivity estimation for ranges
Date	August 24, 2012 15:51:32
Msg-id	5037A2F8.4080808@enterprisedb.com Whole thread Raw
In response to	Re: Statistics and selectivity estimation for ranges (Alexander Korotkov <aekorotkov@gmail.com>)
Responses	Re: Statistics and selectivity estimation for ranges
List	pgsql-hackers

Tree view

On 20.08.2012 00:31, Alexander Korotkov wrote:
> On Thu, Aug 16, 2012 at 4:40 PM, Heikki Linnakangas<
> heikki.linnakangas@enterprisedb.com>  wrote:
>
>> On 15.08.2012 11:34, Alexander Korotkov wrote:
>>
>>> Ok, we've to decide if we need "standard" histogram. In some cases it can
>>> be used for more accurate estimation of<   and>   operators.
>>> But I think it is not so important. So, we can replace "standard"
>>> histogram
>>> with histograms of lower and upper bounds?
>>>
>>
>> Yeah, I think that makes more sense. The lower bound histogram is still
>> useful for<  and>  operators, just not as accurate if there are lots of
>> values with the same lower bound but different upper bound.
>
>
> New version of patch.
> * Collect new stakind STATISTIC_KIND_BOUNDS_HISTOGRAM, which is lower and
> upper bounds histograms combined into single ranges array, instead
> of STATISTIC_KIND_HISTOGRAM.

One worry I have about that format for the histogram is that you
deserialize all the values in the histogram, before you do the binary
searches. That seems expensive if stats target is very high. I guess you
could deserialize them lazily to alleviate that, though.

> * Selectivity estimations for>,>=,<,<=,<<,>>,&<,&>  using this
> histogram.

Thanks!

I'm going to do the same for this that I did for the sp-gist patch, and
punt on the more complicated parts for now, and review them separately.
Attached is a heavily edited version that doesn't include the length
histogram, and consequently doesn't do anything smart for the &< and &>
operators. && is estimated using the bounds histograms. There's now a
separate stakind for the empty range fraction, since it's not included
in the length-histogram.

I tested this on a dataset containing birth and death dates of persons
that have a wikipedia page, obtained from the dbpedia.org project. I can
send a copy if someone wants it. The estimates seem pretty accurate.

Please take a look, to see if I messed up something.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

range_stat-0.7-trimmed-and-edited.patch

pgsql-hackers by date:

From: Alvaro Herrera
Date: 24 August 2012, 15:44:41
Subject: Re: pg_upgrade's exec_prog() coding improvement

From: Tom Lane
Date: 24 August 2012, 17:13:49
Subject: Re: default_isolation_level='serializable' crashes on Windows

Re: Statistics and selectivity estimation for ranges - Mailing list pgsql-hackers

Attachment

Previous

Next