Re: BUG #16865: Regression: GIN Negated prefix search returns results that contain the search term - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #16865: Regression: GIN Negated prefix search returns results that contain the search term
Date
Msg-id 2288864.1613431608@sss.pgh.pa.us
Whole thread Raw
In response to BUG #16865: Regression: GIN Negated prefix search returns results that contain the search term  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #16865: Regression: GIN Negated prefix search returns results that contain the search term  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
=?UTF-8?Q?Dimitri_N=C3=BCscheler?= <dimitri.nuescheler@gmail.com> writes:
> Meanwhile I managed to anonymize the data. I put it in this archive
> https://www.violetsky.ch/postgres-issue-anonymized.tar.gz

Thanks.  I've reproduced the issue and it boils down to doing the
wrong thing when there are GIN_MAYBE values in the input data for
checkcondition_gin, specifically when the bitmap holding the original
GIN index results has become lossy.  We correctly get a TS_MAYBE
result out of the calculation in TS_execute_recurse, but then
TS_execute figures it can throw that detail away and just return
TS_YES.  I think that this problem existed before 2f2007fbb, but
we were masking it by always forcing rechecks.

Anyway, this seems to put the final nail in the coffin of the idea
that it's sufficient for TS_execute to return a boolean.  At least
the tsginidx.c callers really need to see the 3-way result.  Hence
I propose the attached patch.  It fixes the given test case, but
I wonder if you can try it and see if you see any remaining problems.

I wasted quite a bit of time trying to devise a test case that is
short enough to be reasonable to include in our regression tests,
without success ... you need quite a bit of data to make the GIN
bitmap become lossy.  So no test case here, but I'm not super
comfortable with that.

            regards, tom lane

diff --git a/src/backend/utils/adt/tsginidx.c b/src/backend/utils/adt/tsginidx.c
index 6c913baaba..8eed376708 100644
--- a/src/backend/utils/adt/tsginidx.c
+++ b/src/backend/utils/adt/tsginidx.c
@@ -246,10 +246,22 @@ gin_tsquery_consistent(PG_FUNCTION_ARGS)
         gcv.map_item_operand = (int *) (extra_data[0]);
         gcv.need_recheck = recheck;

-        res = TS_execute(GETQUERY(query),
-                         &gcv,
-                         TS_EXEC_PHRASE_NO_POS,
-                         checkcondition_gin);
+        switch (TS_execute_ternary(GETQUERY(query),
+                                   &gcv,
+                                   TS_EXEC_PHRASE_NO_POS,
+                                   checkcondition_gin))
+        {
+            case TS_NO:
+                res = false;
+                break;
+            case TS_YES:
+                res = true;
+                break;
+            case TS_MAYBE:
+                res = true;
+                *recheck = true;
+                break;
+        }
     }

     PG_RETURN_BOOL(res);
@@ -284,11 +296,12 @@ gin_tsquery_triconsistent(PG_FUNCTION_ARGS)
         gcv.map_item_operand = (int *) (extra_data[0]);
         gcv.need_recheck = &recheck;

-        if (TS_execute(GETQUERY(query),
-                       &gcv,
-                       TS_EXEC_PHRASE_NO_POS,
-                       checkcondition_gin))
-            res = recheck ? GIN_MAYBE : GIN_TRUE;
+        res = TS_execute_ternary(GETQUERY(query),
+                                 &gcv,
+                                 TS_EXEC_PHRASE_NO_POS,
+                                 checkcondition_gin);
+        if (res == GIN_TRUE && recheck)
+            res = GIN_MAYBE;
     }

     PG_RETURN_GIN_TERNARY_VALUE(res);
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 2939fb5c21..9236ebcc8f 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -1854,6 +1854,18 @@ TS_execute(QueryItem *curitem, void *arg, uint32 flags,
     return TS_execute_recurse(curitem, arg, flags, chkcond) != TS_NO;
 }

+/*
+ * Evaluate tsquery boolean expression.
+ *
+ * This is the same as TS_execute except that TS_MAYBE is returned as-is.
+ */
+TSTernaryValue
+TS_execute_ternary(QueryItem *curitem, void *arg, uint32 flags,
+                   TSExecuteCallback chkcond)
+{
+    return TS_execute_recurse(curitem, arg, flags, chkcond);
+}
+
 /*
  * TS_execute recursion for operators above any phrase operator.  Here we do
  * not need to worry about lexeme positions.  As soon as we hit an OP_PHRASE
diff --git a/src/include/tsearch/ts_utils.h b/src/include/tsearch/ts_utils.h
index 69a9ba8524..4266560151 100644
--- a/src/include/tsearch/ts_utils.h
+++ b/src/include/tsearch/ts_utils.h
@@ -199,6 +199,9 @@ typedef TSTernaryValue (*TSExecuteCallback) (void *arg, QueryOperand *val,

 extern bool TS_execute(QueryItem *curitem, void *arg, uint32 flags,
                        TSExecuteCallback chkcond);
+extern TSTernaryValue TS_execute_ternary(QueryItem *curitem, void *arg,
+                                         uint32 flags,
+                                         TSExecuteCallback chkcond);
 extern bool tsquery_requires_match(QueryItem *curitem);

 /*

pgsql-bugs by date:

Previous
From: Arthur Nascimento
Date:
Subject: Re: BUG #16867: savepoints vs. commit and chain
Next
From: PG Bug reporting form
Date:
Subject: BUG #16868: Cannot find sqlstat error codes.