Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Date
Msg-id 1039976.1623959387@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
List pgsql-bugs
I wrote:
> Pawel Kudzia <kudzia@gmail.com> writes:
>> with help from IRC we've found that decreasing work_mem from 1MB to 256kB
>> or less makes the problem go away:

> Hmm.  So that suggests that the index itself is *not* corrupt,
> but the problem is associated with a bug in the indexscan
> algorithms.

After staring at the code I think there is at least one bug, and
possibly two, in shimTriConsistentFn.  That's likely to be implicated
here because intarray's GIN opclass only provides a bool consistent
function.  I'm not very clear on the circumstances that lead to the scan
code inventing GIN_MAYBE inputs, so I haven't been able to construct a
test case.

The definite bug is triggered because intarray relies on the API
specification that says that if the search mode is
GIN_SEARCH_MODE_DEFAULT, then the consistentFn will only be called
when there's at least one TRUE key:

        case RTOverlapStrategyNumber:
            /* result is not lossy */
            *recheck = false;
            /* at least one element in check[] is true, so result = true */
            res = true;
            break;

shimTriConsistentFn ignores this and calls it on all-FALSE inputs,
for which it'll incorrectly get a TRUE result, as it will also for
all the following checks.  The upshot is that shimTriConsistentFn
will convert any case with a MAYBE input to a hard TRUE result,
with no recheck requirement.  This'd easily explain the reported
misbehavior.  So in the spot where we do this:

    /* First call consistent function with all the maybe-inputs set FALSE */
    for (i = 0; i < nmaybe; i++)
        key->entryRes[maybeEntries[i]] = GIN_FALSE;
    curResult = directBoolConsistentFn(key);

we need to add some code that checks for default searchMode, and in
that case doesn't call the consistentFn unless at least one
(non-MAYBE) input is TRUE.

The other thing that I'm unsure about, because the code is horribly
underdocumented, is that it's not very clear whether
shimTriConsistentFn is accurately converting between the bool and
the tristate APIs.  That's because it's not very clear what the
tristate API actually *is*.  In particular, is the end state of
key->recheckCurItem meaningful in the tristate case?  If it's not,
then the short-circuit case for no MAYBE inputs is broken, because
it'll return TRUE when the bool consistentFn is trying to tell us
to recheck.  But if it is meaningful, then the multiway case is broken,
because it will return the recheckCurItem result set by the last call to
the bool consistentfn; which might be false even though other calls said
true.  (So in that case I think we'd need "key->recheckCurItem = recheck"
at the end.)  I also wonder how any of that works correctly for real
triconsistent functions, which don't have access to the recheckCurItem
flag.

Anyway, I'm punting this to the code authors.  I'd like to see
some comments clarifying what the API really is, not just a
quick-n-dirty code fix.

            regards, tom lane



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #17061: Impossible to query the fields of the tuple created by SEARCH BREADTH FIRST BY .. SET ..
Next
From: PG Bug reporting form
Date:
Subject: BUG #17062: Assert failed in RemoveRoleFromObjectPolicy() on DROP OWNED policy applied to duplicate role