Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows - Mailing list pgsql-bugs
From | Alexander Korotkov |
---|---|
Subject | Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |
Date | |
Msg-id | CAPpHfdt5seb3UC7M8nGpNURSV=LqqBWaSs3To3YV_pxUXjA8Qg@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |
List | pgsql-bugs |
On Thu, Jun 17, 2021 at 10:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: > > Pawel Kudzia <kudzia@gmail.com> writes: > >> with help from IRC we've found that decreasing work_mem from 1MB to 256kB > >> or less makes the problem go away: > > > Hmm. So that suggests that the index itself is *not* corrupt, > > but the problem is associated with a bug in the indexscan > > algorithms. > > After staring at the code I think there is at least one bug, and > possibly two, in shimTriConsistentFn. That's likely to be implicated > here because intarray's GIN opclass only provides a bool consistent > function. I'm not very clear on the circumstances that lead to the scan > code inventing GIN_MAYBE inputs, so I haven't been able to construct a > test case. > > The definite bug is triggered because intarray relies on the API > specification that says that if the search mode is > GIN_SEARCH_MODE_DEFAULT, then the consistentFn will only be called > when there's at least one TRUE key: > > case RTOverlapStrategyNumber: > /* result is not lossy */ > *recheck = false; > /* at least one element in check[] is true, so result = true */ > res = true; > break; > > shimTriConsistentFn ignores this and calls it on all-FALSE inputs, > for which it'll incorrectly get a TRUE result, as it will also for > all the following checks. The upshot is that shimTriConsistentFn > will convert any case with a MAYBE input to a hard TRUE result, > with no recheck requirement. This'd easily explain the reported > misbehavior. So in the spot where we do this: > > /* First call consistent function with all the maybe-inputs set FALSE */ > for (i = 0; i < nmaybe; i++) > key->entryRes[maybeEntries[i]] = GIN_FALSE; > curResult = directBoolConsistentFn(key); > > we need to add some code that checks for default searchMode, and in > that case doesn't call the consistentFn unless at least one > (non-MAYBE) input is TRUE. +1, I think in default search mode we can just start with curResult equal to GIN_FALSE instead of calling directBoolConsistentFn(). > The other thing that I'm unsure about, because the code is horribly > underdocumented, is that it's not very clear whether > shimTriConsistentFn is accurately converting between the bool and > the tristate APIs. That's because it's not very clear what the > tristate API actually *is*. In particular, is the end state of > key->recheckCurItem meaningful in the tristate case? If it's not, > then the short-circuit case for no MAYBE inputs is broken, because > it'll return TRUE when the bool consistentFn is trying to tell us > to recheck. But if it is meaningful, then the multiway case is broken, > because it will return the recheckCurItem result set by the last call to > the bool consistentfn; which might be false even though other calls said > true. (So in that case I think we'd need "key->recheckCurItem = recheck" > at the end.) I also wonder how any of that works correctly for real > triconsistent functions, which don't have access to the recheckCurItem > flag. As far as I recall, returning MAYBE for no MAYBE inputs in the triConsistent function is equivalent to setting the recheck flag in the bool consistent function. Thus, short-circuit case for no MAYBE inputs should be broken. > Anyway, I'm punting this to the code authors. I'd like to see > some comments clarifying what the API really is, not just a > quick-n-dirty code fix. Yep, I'm going to have a closer look at this tomorrow. ------ Regards, Alexander Korotkov
pgsql-bugs by date: