patternsel() and histogram_selectivity() and the hard cutoff of 100 - Mailing list pgsql-hackers

From Gregory Stark
Subject patternsel() and histogram_selectivity() and the hard cutoff of 100
Date
Msg-id 878x1zmqe4.fsf@oxford.xeocode.com
Whole thread Raw
Responses Re: patternsel() and histogram_selectivity() and the hard cutoff of 100  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
So I had a thought about how to soften the controversial hard cutoff of 100
for the use of the histogram selectivity. Instead of switching 100% one way or
the other between the two heuristics why not calculate both and combine them.
The larger the sample size from the histogram the more we can weight the
histogram calculation. The smaller the histogram size the more we weight the
heuristic.

My first thought was to scale it linearly so we use 10% of the histogram
sample + 90% of the heuristic for default statistic sizes of 10 samples. That
degenerates to the status quo for 100 samples and up.

But actually I wonder if we can't get some solid statistics behind the
percentages. The lower the sample size the larger the 95th percentile
confidence interval and we want to use heuristic to adjust the result within
the confidence interval. I think there are even ways of calculating
pessimistic confidence intervals for unrepresentative samples.

This would allow the results to degrade smoothly. So a sample size of 50 would
be expected to give results somewhere between those of 10 and 100. Instead of
the current behaviour where the results will be exactly the same until you hit
100 and then suddenly jump to a different value.

I would do it by just making histogram_selectivity() never fail unless the
histogram is less than 2 * the ignored values. There would be an additional
parameter with a double* where the function would store the percentage weight
to give the result. The caller would be responsible for combining the result
just as it is with the MCV estimates.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: path with spaces in config.pl
Next
From: Simon Riggs
Date:
Subject: Re: Why are we waiting?