Re: Phrase search vs. multi-lexeme tokens - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Phrase search vs. multi-lexeme tokens
Date
Msg-id 10026.1609953512@sss.pgh.pa.us
Whole thread Raw
In response to Phrase search vs. multi-lexeme tokens  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: Phrase search vs. multi-lexeme tokens  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
Alexander Korotkov <aekorotkov@gmail.com> writes:
> # select to_tsvector('pg_class foo') @@ websearch_to_tsquery('"pg_class foo"');
>  ?column?
> ----------
>  f

Yeah, surely this is wrong.

> # select to_tsquery('pg_class <-> foo');
>           to_tsquery
> ------------------------------
>  ( 'pg' & 'class' ) <-> 'foo'

> I think if a user writes 'pg_class <-> foo', then it's expected to
> match 'pg_class foo' independently on which lexemes 'pg_class' is
> split into.

Indeed.  It seems to me that this:

regression=# select to_tsquery('pg_class');
   to_tsquery
----------------
 'pg' & 'class'
(1 row)

is wrong all by itself.  Now that we have phrase search, a much
saner translation would be "'pg' <-> 'class'".  If we fixed that
then it seems like the more complex case would just work.

I read your patch over quickly and it seems like a reasonable
approach (but sadly underdocumented).  Can we extend the idea
to fix the to_tsquery case?

            regards, tom lane



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [PoC] Non-volatile WAL buffer
Next
From: Magnus Hagander
Date:
Subject: Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)