On Mon, Jan 06, 2020 at 03:28:34PM -0500, Tom Lane wrote:
> I believe that Michael's draft fix is actually about right, because
> we don't want to treat the key space above the histogram upper bound
> as being an actual bin: we should assume that there are no values there.
> So just starting the loop at the last real bin is sufficient. However,
> I'd prefer to handle the two edge cases (i.e., probe value below histogram
> lower bound or above histogram upper bound) explicitly, because that makes
> it more apparent what's going on.
Actually, I suspect that the original error comes from a copy-pasto
from calc_hist_selectivity_contains to calc_hist_selectivity_contained
when the code has been originally written.
> There are a couple of other ways in which this code seems insufficiently
> paranoid to me:
>
> * It seems worth spending a couple lines of code at the beginning to
> verify that the histogram has at least one bin, as it already did for
> the range-length histogram. Otherwise we've got various divide-by-zero
> hazards here, along with a need for extra tests in the looping functions.
Fine by me.
> * I think we'd better guard against NaNs coming back from the range
> subdiff function, as well as the possibility of getting a NaN from
> the division in get_position (compare [1]).
Good idea.
> * I am not quite totally sure about this part, but it seems to me
> that calc_hist_selectivity_contains probably ought to handle the
> range-bound cases the same as calc_hist_selectivity_contained,
> even though only the latter is trying to access an unsafe array
> index.
Both could actually be merged, no? The assumptions behind the
upper/lower bound lookups are actually just mirrors of each other
based on which side of the operators <@ or @> the code looks at.
Perhaps the current code is cleaner as done currently anyway..
> That all leads me to the attached draft patch. It lacks a test
> case --- I wonder if we can find one that crashes more portably
> than Michael's? And more eyeballs on calc_hist_selectivity_contains
> would be good too.
It would be nice to keep the test case as it crashes immediately on
own Debian laptop. Anyway, if we add a test, we really need
something that pokes at the bin with the histogram with the upper
bound. Just something like the attached on HEAD proves that we have
zero coverage for it:
--- a/src/backend/utils/adt/rangetypes_selfuncs.c
+++ b/src/backend/utils/adt/rangetypes_selfuncs.c
@@ -1050,6 +1050,7 @@ calc_hist_selectivity_contained(TypeCacheEntry *typcache,
double dist;
double length_hist_frac;
bool final_bin = false;
+ Assert(i == upper_index);
So that's not good from the start..
- if (bin_width <= 0.0)
+ if (isnan(bin_width) || bin_width <= 0.0)
return 0.5; /* zero width bin */
This comment is incorrect now?
+ if (isnan(position))
+ return 0.5; /* punt if we have any NaNs or Infs */
+
This comment is incorrect as well, no? In this case both histograms
have finite bounds, and you just check if that's a number.
--
Michael