PG Bug reporting form <noreply@postgresql.org> writes:
> Although the docs
> https://www.postgresql.org/docs/current/textsearch-controls.html say nothing
> about websearch_to_tsquery supporting parentheses in queries, I noticed some
> inconsistent behaviour when using multiple 'or' keywords with parentheses in
> postgres 15.4
The definition of websearch_to_tsquery says pretty plainly that
"Other punctuation is ignored". So I'd expect parens to do nothing.
That makes this problematic:
> select websearch_to_tsquery('german', 'foo or baz bar or (ding dong)');
> websearch_to_tsquery
> -----------------------------------------
> 'foo' | 'baz' & 'bar' | 'ding' & 'dong'
> select websearch_to_tsquery('german', 'foo or (baz bar) or (ding dong)');
> websearch_to_tsquery
> ------------------------------------------------
> 'foo' | 'baz' & 'bar' & 'or' & 'ding' & 'dong'
I found what seems to be the issue in gettoken_query_websearch: it
ignores ISOPERATOR chars (including parens) in WAITOPERAND state,
but not in WAITOPERATOR state. That results in switching back to
WAITOPERAND state which will consume the "or" as a regular word.
So a minimal fix could look like the attached.
It's fairly confusing that this code manages to ignore not-ISOPERATOR
punctuation. It seems like that gets eaten by gettoken_tsvector()
and then later we decide there's not really a word there.
I'm also confused how come the same thing doesn't happen in the
english tsconfig. Not sure it's worth poking at more, though.
regards, tom lane
diff --git a/src/backend/utils/adt/tsquery.c b/src/backend/utils/adt/tsquery.c
index 690a80d774..eb08e912ea 100644
--- a/src/backend/utils/adt/tsquery.c
+++ b/src/backend/utils/adt/tsquery.c
@@ -492,6 +492,12 @@ gettoken_query_websearch(TSQueryParserState state, int8 *operator,
*operator = OP_OR;
return PT_OPR;
}
+ else if (ISOPERATOR(state->buf))
+ {
+ /* ignore other operators here too */
+ state->buf++;
+ continue;
+ }
else if (*state->buf == '\0')
{
return PT_END;