Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833numrange query - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833numrange query
Date
Msg-id 20200107063907.GF2386@paquier.xyz
Whole thread Raw
In response to Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833 numrange query  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833 numrange query
List pgsql-bugs
On Mon, Jan 06, 2020 at 03:28:34PM -0500, Tom Lane wrote:
> I believe that Michael's draft fix is actually about right, because
> we don't want to treat the key space above the histogram upper bound
> as being an actual bin: we should assume that there are no values there.
> So just starting the loop at the last real bin is sufficient.  However,
> I'd prefer to handle the two edge cases (i.e., probe value below histogram
> lower bound or above histogram upper bound) explicitly, because that makes
> it more apparent what's going on.

Actually, I suspect that the original error comes from a copy-pasto
from calc_hist_selectivity_contains to calc_hist_selectivity_contained
when the code has been originally written.

> There are a couple of other ways in which this code seems insufficiently
> paranoid to me:
>
> * It seems worth spending a couple lines of code at the beginning to
> verify that the histogram has at least one bin, as it already did for
> the range-length histogram.  Otherwise we've got various divide-by-zero
> hazards here, along with a need for extra tests in the looping functions.

Fine by me.

> * I think we'd better guard against NaNs coming back from the range
> subdiff function, as well as the possibility of getting a NaN from
> the division in get_position (compare [1]).

Good idea.

> * I am not quite totally sure about this part, but it seems to me
> that calc_hist_selectivity_contains probably ought to handle the
> range-bound cases the same as calc_hist_selectivity_contained,
> even though only the latter is trying to access an unsafe array
> index.

Both could actually be merged, no?  The assumptions behind the
upper/lower bound lookups are actually just mirrors of each other
based on which side of the operators <@ or @> the code looks at.
Perhaps the current code is cleaner as done currently anyway..

> That all leads me to the attached draft patch.  It lacks a test
> case --- I wonder if we can find one that crashes more portably
> than Michael's?  And more eyeballs on calc_hist_selectivity_contains
> would be good too.

It would be nice to keep the test case as it crashes immediately on
own Debian laptop.  Anyway, if we add a test, we really need
something that pokes at the bin with the histogram with the upper
bound.  Just something like the attached on HEAD proves that we have
zero coverage for it:
--- a/src/backend/utils/adt/rangetypes_selfuncs.c
+++ b/src/backend/utils/adt/rangetypes_selfuncs.c
@@ -1050,6 +1050,7 @@ calc_hist_selectivity_contained(TypeCacheEntry *typcache,
    double      dist;
    double      length_hist_frac;
    bool        final_bin = false;
+   Assert(i == upper_index);
So that's not good from the start..

-   if (bin_width <= 0.0)
+   if (isnan(bin_width) || bin_width <= 0.0)
     return 0.5;            /* zero width bin */
This comment is incorrect now?

+   if (isnan(position))
+       return 0.5;            /* punt if we have any NaNs or Infs */
+
This comment is incorrect as well, no?  In this case both histograms
have finite bounds, and you just check if that's a number.
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #16190: The usage of NULL pointer in refint.c
Next
From: Michael Paquier
Date:
Subject: Re: BUG #16190: The usage of NULL pointer in refint.c