Home > mailing lists

Re: Statistics and selectivity estimation for ranges - Mailing list pgsql-hackers

From	Alexander Korotkov
Subject	Re: Statistics and selectivity estimation for ranges
Date	September 4, 2012 16:27:29
Msg-id	CAPpHfds=cxF13Zg3WBzbqBy35djGSbKpTve_i73RMQKY_B-08g@mail.gmail.com Whole thread Raw
In response to	Re: Statistics and selectivity estimation for ranges (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses	Re: Statistics and selectivity estimation for ranges
List	pgsql-hackers

Tree view

On Mon, Aug 27, 2012 at 5:00 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

On 24.08.2012 18:51, Heikki Linnakangas wrote:
On 20.08.2012 00:31, Alexander Korotkov wrote:
New version of patch.
* Collect new stakind STATISTIC_KIND_BOUNDS_HISTOGRAM, which is lower and
upper bounds histograms combined into single ranges array, instead
of STATISTIC_KIND_HISTOGRAM.

One worry I have about that format for the histogram is that you
deserialize all the values in the histogram, before you do the binary
searches. That seems expensive if stats target is very high. I guess you
could deserialize them lazily to alleviate that, though.

* Selectivity estimations for>,>=,<,<=,<<,>>,&<,&> using this
histogram.

Thanks!

I'm going to do the same for this that I did for the sp-gist patch, and
punt on the more complicated parts for now, and review them separately.
Attached is a heavily edited version that doesn't include the length
histogram, and consequently doesn't do anything smart for the &< and &>
operators. && is estimated using the bounds histograms. There's now a
separate stakind for the empty range fraction, since it's not included
in the length-histogram.

I tested this on a dataset containing birth and death dates of persons
that have a wikipedia page, obtained from the dbpedia.org project. I can
send a copy if someone wants it. The estimates seem pretty accurate.

Please take a look, to see if I messed up something.

Committed this with some further changes.

Addon patch is attached. Actually, I don't get your intention of introducing STATISTIC_KIND_RANGE_EMPTY_FRAC stakind. Did you plan to leave it as empty frac in distinct stakind or replace this stakind with STATISTIC_KIND_LENGTH_HISTOGRAM? In the attached patch STATISTIC_KIND_RANGE_EMPTY_FRAC is replaced with STATISTIC_KIND_LENGTH_HISTOGRAM.

------
With best regards,
Alexander Korotkov.

Attachment

range_stat-addon-0.1.patch.gz

pgsql-hackers by date:

From: Amit kapila
Date: 04 September 2012, 16:27:11
Subject: Re: [WIP PATCH] for Performance Improvement in Buffer Management

From: Alexander Korotkov
Date: 04 September 2012, 16:45:48
Subject: Re: SP-GiST for ranges based on 2d-mapping and quad-tree

Re: Statistics and selectivity estimation for ranges - Mailing list pgsql-hackers

Attachment

Previous

Next