On 04.08.2012 12:31, Alexander Korotkov wrote:
> Hackers,
>
> attached patch is for collecting statistics and selectivity estimation for
> ranges.
>
> In order to make our estimations accurate for every distribution of
> ranges, we would collect 2d-distribution of lower and upper bounds of range
> into some kind of 2d-histogram. However, this patch use some simplification
> and assume distribution of lower bound and distribution of length to be
> independent.
Sounds reasonable. Another possibility would be to calculate the average
length for each lower-bound bin. So you would e.g know the average
length of values with lower bound between 1-10, and the average length
of values with lower bound between 10-20, and so forth. Within a bin,
you would have to assume that the distribution of the lengths is fixed.
PS. get_position() should guard against division by zero, when subdiff
returns zero.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com