Re: BUG #18479: websearch_to_tsquery inconsistent behavior for german when using parentheses - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #18479: websearch_to_tsquery inconsistent behavior for german when using parentheses
Date
Msg-id 2184370.1718323162@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #18479: websearch_to_tsquery inconsistent behavior for german when using parentheses  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
[ couldn't let go of this ... ]

I wrote:
> It's fairly confusing that this code manages to ignore not-ISOPERATOR
> punctuation.  It seems like that gets eaten by gettoken_tsvector()
> and then later we decide there's not really a word there.

Yeah, further investigation shows that such cases effectively act
like stopwords: they are passed back to makepol() as VAL strings,
but then lexize processing rejects them as not words.

> I'm also confused how come the same thing doesn't happen in the
> english tsconfig.  Not sure it's worth poking at more, though.

D'oh: "or" is a stopword in the english config.  The english case
is still wrong of course, just differently:

regression=# select websearch_to_tsquery('english', 'foo or (baz bar) or (ding dong)');
          websearch_to_tsquery
-----------------------------------------
 'foo' | 'baz' & 'bar' & 'ding' & 'dong'
(1 row)

            regards, tom lane



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #18479: websearch_to_tsquery inconsistent behavior for german when using parentheses
Next
From: Michael Paquier
Date:
Subject: Re: error "can only drop stats once" brings down database