Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833numrange query - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833numrange query
Date
Msg-id 20191210053815.GI72921@paquier.xyz
Whole thread Raw
In response to Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833numrange query  (Michael Paquier <michael@paquier.xyz>)
Responses Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833numrange query
Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833numrange query
List pgsql-bugs
On Tue, Nov 19, 2019 at 08:40:56PM +0900, Michael Paquier wrote:
> If you add an ANALYZE on the table natica_hdu_test after restoring, I
> am rather sure that you would reproduce the crash more quickly because
> the handling around the stats of the column are busted here.  Anyway,
> taking my example of upthread, I have been also able to reproduce the
> problem on REL_10_STABLE even with assertions enabled: the trick is
> that you need to leave once the session after the analyze on the
> table.  Then a SELECT within a new session is enough to crash the
> server.

So...  I have looked more at this one, and from my previous example it
seems that we have a one-off error when looking up at the array
holding the histograms for ranges (lower and upper bound).

In my previous example, we get to build 101 RangeBounds when beginning
to calculate the range operator selectivity in
calc_hist_selectivity().  However, when we get to the point of
calc_hist_selectivity_contained(), upper_index gets calculated at 100
which is just at the limit of the indexed bounds, and the code would
happily look at the last bound as well as the one-after-the-last bound
as range_cmp_bounds() sees fit, but the latter just points to the
void.  The code looks wrong since its introduction in 59d0bf9d and
it seems that the changes done for free_attstatsslot() in 9aab83f make
the issue more easily reproducible.

A fix like the rough POC attached addresses the issue, but I think
that's too naive to not count for the first bin in the ranges
evaluated.  Tomas, you may be more familiar with this area of the code
than I am.  What do you think?
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833numrange query
Next
From: Andrey Lepikhov
Date:
Subject: Re: Warning in the RecordTransactionAbort routine during compilationwith O3 flag