Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833 numrange query - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833 numrange query |
Date | |
Msg-id | 23459.1578409687@sss.pgh.pa.us Whole thread Raw |
In response to | Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833numrange query (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833numrange query
|
List | pgsql-bugs |
Michael Paquier <michael@paquier.xyz> writes: > On Mon, Jan 06, 2020 at 03:28:34PM -0500, Tom Lane wrote: >> * I am not quite totally sure about this part, but it seems to me >> that calc_hist_selectivity_contains probably ought to handle the >> range-bound cases the same as calc_hist_selectivity_contained, >> even though only the latter is trying to access an unsafe array >> index. > Both could actually be merged, no? The assumptions behind the > upper/lower bound lookups are actually just mirrors of each other > based on which side of the operators <@ or @> the code looks at. Hmm ... maybe, but I'm not entirely convinced that they should be mirror images. The endpoint handling is the same right now, but should it be? Anyway, merging them seems out of scope for a bug fix. >> That all leads me to the attached draft patch. It lacks a test >> case --- I wonder if we can find one that crashes more portably >> than Michael's? And more eyeballs on calc_hist_selectivity_contains >> would be good too. > It would be nice to keep the test case as it crashes immediately on > own Debian laptop. I wouldn't mind having a test case if it crashes reliably, but it doesn't for me. I poked into why not, and realized that even if get_position is passed a pointer to garbage data in hist2, it will not crash unless the "!hist2->infinite" test succeeds, because otherwise it won't call the rng_subdiff function. Depending on compiler and optimization level, that usually means that any nonzero value in that byte will prevent a crash. Given that the whole point here is we're accessing memory that hasn't been initialized by this code, we don't have any way to make it likely that that won't be true. (Of course, we could have calc_hist_selectivity() allocate one extra array slot and zero it, but that's getting a bit silly.) BTW, I noticed that at -O0, my compiler generates a code sequence that effectively implements that test as "hist2->infinite != true", ie it will succeed for any bit pattern that's not exactly 0x01. This greatly raises the odds of seeing the crash of course (so maybe your compiler is doing likewise?). I find that rather disturbing because my mental model has always been that any nonzero bit pattern in a bool would be interpreted as true. Apparently the gcc boys think that they're entitled to consider _Bool values other than exactly 0 and 1 as undefined, which is scary because I'm not sure we are totally consistent about that everywhere. It definitely makes it harder to reason about what will happen with garbage memory contents. (This might also explain why you saw a change in behavior at 9a95a77d9d.) So the bottom line here is that I don't think that we can build a test case that will crash with any large probability, especially not if it's just one part of a test script --- the longer the session has been running, the greater the odds are that whatever is just past the allocated histogram array contains nonzero garbage. I'm not sure it's worth spending test cycles forevermore on a test that most likely will fail to reveal a bug even if it's there. > - if (bin_width <= 0.0) > + if (isnan(bin_width) || bin_width <= 0.0) > return 0.5; /* zero width bin */ > This comment is incorrect now? Yeah, it could be improved. Maybe "punt for NaN or zero-width bin"? > + if (isnan(position)) > + return 0.5; /* punt if we have any NaNs or Infs */ > + > This comment is incorrect as well, no? In this case both histograms > have finite bounds, and you just check if that's a number. The point is that we could get a NaN from the division either from the rng_subdiff function returning NaN, or from an undefined case such as Inf/Inf. Maybe write "punt for NaN input, Inf/Inf, etc"? (We don't need to test for Inf explicitly, because the tests just below can deal with that.) regards, tom lane
pgsql-bugs by date: