Re: Statistics and selectivity estimation for ranges - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Statistics and selectivity estimation for ranges
Date
Msg-id CAPpHfds=0729kHnS0EGT3ydte1uBOxczCCVys--nfyqaH+DALA@mail.gmail.com
Whole thread Raw
In response to Re: Statistics and selectivity estimation for ranges  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Statistics and selectivity estimation for ranges
List pgsql-hackers
On Mon, Aug 6, 2012 at 6:09 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
On 04.08.2012 12:31, Alexander Korotkov wrote:
Hackers,

attached patch is for collecting statistics and selectivity estimation for
ranges.

In order to make our estimations accurate for every distribution of
ranges, we would collect 2d-distribution of lower and upper bounds of range
into some kind of 2d-histogram. However, this patch use some simplification
and assume distribution of lower bound and distribution of length to be
independent.

Sounds reasonable. Another possibility would be to calculate the average length for each lower-bound bin. So you would e.g know the average length of values with lower bound between 1-10, and the average length of values with lower bound between 10-20, and so forth. Within a bin, you would have to assume that the distribution of the lengths is fixed.

Interesting idea. AFAICS, if we store average length for each lower-bound bin, we still have to assume some kind of distribution of range length in order to do estimates. For example, assume that range length have exponential distribution. Correspondingly, we've following trade off: we don't have to assume lower bound distribution to be independent from length distribution, but we have to assume kind of length distribution. Actually, I don't know what is better.
Ideally, we would have range length histogram for each lower-bound bin, or upper-bound histogram for each lower-bound bin. But, storing such amount of data seems too expensive.

------
With best regards,
Alexander Korotkov.

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Beta 3
Next
From: "Etsuro Fujita"
Date:
Subject: Re: WIP Patch: Use sortedness of CSV foreign tables for query planning