Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows - Mailing list pgsql-bugs

From Alexander Korotkov
Subject Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Date
Msg-id CAPpHfdt5seb3UC7M8nGpNURSV=LqqBWaSs3To3YV_pxUXjA8Qg@mail.gmail.com
Whole thread Raw
In response to Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
List pgsql-bugs
On Thu, Jun 17, 2021 at 10:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > Pawel Kudzia <kudzia@gmail.com> writes:
> >> with help from IRC we've found that decreasing work_mem from 1MB to 256kB
> >> or less makes the problem go away:
>
> > Hmm.  So that suggests that the index itself is *not* corrupt,
> > but the problem is associated with a bug in the indexscan
> > algorithms.
>
> After staring at the code I think there is at least one bug, and
> possibly two, in shimTriConsistentFn.  That's likely to be implicated
> here because intarray's GIN opclass only provides a bool consistent
> function.  I'm not very clear on the circumstances that lead to the scan
> code inventing GIN_MAYBE inputs, so I haven't been able to construct a
> test case.
>
> The definite bug is triggered because intarray relies on the API
> specification that says that if the search mode is
> GIN_SEARCH_MODE_DEFAULT, then the consistentFn will only be called
> when there's at least one TRUE key:
>
>         case RTOverlapStrategyNumber:
>             /* result is not lossy */
>             *recheck = false;
>             /* at least one element in check[] is true, so result = true */
>             res = true;
>             break;
>
> shimTriConsistentFn ignores this and calls it on all-FALSE inputs,
> for which it'll incorrectly get a TRUE result, as it will also for
> all the following checks.  The upshot is that shimTriConsistentFn
> will convert any case with a MAYBE input to a hard TRUE result,
> with no recheck requirement.  This'd easily explain the reported
> misbehavior.  So in the spot where we do this:
>
>     /* First call consistent function with all the maybe-inputs set FALSE */
>     for (i = 0; i < nmaybe; i++)
>         key->entryRes[maybeEntries[i]] = GIN_FALSE;
>     curResult = directBoolConsistentFn(key);
>
> we need to add some code that checks for default searchMode, and in
> that case doesn't call the consistentFn unless at least one
> (non-MAYBE) input is TRUE.

 +1,
I think in default search mode we can just start with curResult equal
to GIN_FALSE instead of calling directBoolConsistentFn().

> The other thing that I'm unsure about, because the code is horribly
> underdocumented, is that it's not very clear whether
> shimTriConsistentFn is accurately converting between the bool and
> the tristate APIs.  That's because it's not very clear what the
> tristate API actually *is*.  In particular, is the end state of
> key->recheckCurItem meaningful in the tristate case?  If it's not,
> then the short-circuit case for no MAYBE inputs is broken, because
> it'll return TRUE when the bool consistentFn is trying to tell us
> to recheck.  But if it is meaningful, then the multiway case is broken,
> because it will return the recheckCurItem result set by the last call to
> the bool consistentfn; which might be false even though other calls said
> true.  (So in that case I think we'd need "key->recheckCurItem = recheck"
> at the end.)  I also wonder how any of that works correctly for real
> triconsistent functions, which don't have access to the recheckCurItem
> flag.

As far as I recall, returning MAYBE for no MAYBE inputs in the
triConsistent function is equivalent to setting the recheck flag in
the bool consistent function.  Thus, short-circuit case for no MAYBE
inputs should be broken.

> Anyway, I'm punting this to the code authors.  I'd like to see
> some comments clarifying what the API really is, not just a
> quick-n-dirty code fix.

Yep, I'm going to have a closer look at this tomorrow.

------
Regards,
Alexander Korotkov



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #17062: Assert failed in RemoveRoleFromObjectPolicy() on DROP OWNED policy applied to duplicate role
Next
From: Tom Lane
Date:
Subject: Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows