Thank you Michael! I know this is a late response but didn't want to clutter the message stream until we had confirmation, but I applied this patch to our server and could not replicate the bug. So this patch is working for us presently.
On Tue, Nov 19, 2019 at 08:40:56PM +0900, Michael Paquier wrote: > If you add an ANALYZE on the table natica_hdu_test after restoring, I > am rather sure that you would reproduce the crash more quickly because > the handling around the stats of the column are busted here. Anyway, > taking my example of upthread, I have been also able to reproduce the > problem on REL_10_STABLE even with assertions enabled: the trick is > that you need to leave once the session after the analyze on the > table. Then a SELECT within a new session is enough to crash the > server.
So... I have looked more at this one, and from my previous example it seems that we have a one-off error when looking up at the array holding the histograms for ranges (lower and upper bound).
In my previous example, we get to build 101 RangeBounds when beginning to calculate the range operator selectivity in calc_hist_selectivity(). However, when we get to the point of calc_hist_selectivity_contained(), upper_index gets calculated at 100 which is just at the limit of the indexed bounds, and the code would happily look at the last bound as well as the one-after-the-last bound as range_cmp_bounds() sees fit, but the latter just points to the void. The code looks wrong since its introduction in 59d0bf9d and it seems that the changes done for free_attstatsslot() in 9aab83f make the issue more easily reproducible.
A fix like the rough POC attached addresses the issue, but I think that's too naive to not count for the first bin in the ranges evaluated. Tomas, you may be more familiar with this area of the code than I am. What do you think? -- Michael