Re: websearch_to_tsquery() returns queries that don't match to_tsvector() - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: websearch_to_tsquery() returns queries that don't match to_tsvector()
Date
Msg-id CAPpHfdsKy5TzOTq5aV8tn+KQEd_C5mF0Sd_BrZ0e3+wGY5tLFw@mail.gmail.com
Whole thread Raw
Responses Re: websearch_to_tsquery() returns queries that don't match to_tsvector()
List pgsql-hackers
Hi!

On Mon, Apr 19, 2021 at 9:57 AM Valentin Gatien-Baron
<valentin.gatienbaron@gmail.com> wrote:
> Looking at the tsvector and tsquery, we can see that the problem is
> that the ":" counts as one position for the ts_query but not the
> ts_vector:
>
> select to_tsvector('english', 'aaa: bbb'), websearch_to_tsquery('english', '"aaa: bbb"');
>    to_tsvector   | websearch_to_tsquery
> -----------------+----------------------
>  'aaa':1 'bbb':2 | 'aaa' <2> 'bbb'
> (1 row)

It seems there is another bug with phrase search and query parsing.
It seems to me that since 0c4f355c6a websearch_to_tsquery() should
just parse text in quotes as a single token.  Besides fixing this bug,
it simplifies the code.

Trying to fix this bug before 0c4f355c6a doesn't seem to worth the efforts.

I propose to push the attached patch to v14.  Objections?

------
Regards,
Alexander Korotkov

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Regex performance regression induced by match-all code
Next
From: Tom Lane
Date:
Subject: Re: websearch_to_tsquery() returns queries that don't match to_tsvector()