Home > mailing lists

Re: Phrase search vs. multi-lexeme tokens - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Phrase search vs. multi-lexeme tokens
Date	January 6, 2021 17:18:32
Msg-id	10026.1609953512@sss.pgh.pa.us Whole thread Raw
In response to	Phrase search vs. multi-lexeme tokens (Alexander Korotkov <aekorotkov@gmail.com>)
Responses	Re: Phrase search vs. multi-lexeme tokens
List	pgsql-hackers

Tree view

Alexander Korotkov <aekorotkov@gmail.com> writes:
> # select to_tsvector('pg_class foo') @@ websearch_to_tsquery('"pg_class foo"');
>  ?column?
> ----------
>  f

Yeah, surely this is wrong.

> # select to_tsquery('pg_class <-> foo');
>           to_tsquery
> ------------------------------
>  ( 'pg' & 'class' ) <-> 'foo'

> I think if a user writes 'pg_class <-> foo', then it's expected to
> match 'pg_class foo' independently on which lexemes 'pg_class' is
> split into.

Indeed.  It seems to me that this:

regression=# select to_tsquery('pg_class');
   to_tsquery
----------------
 'pg' & 'class'
(1 row)

is wrong all by itself.  Now that we have phrase search, a much
saner translation would be "'pg' <-> 'class'".  If we fixed that
then it seems like the more complex case would just work.

I read your patch over quickly and it seems like a reasonable
approach (but sadly underdocumented).  Can we extend the idea
to fix the to_tsquery case?

            regards, tom lane

pgsql-hackers by date:

From: Tomas Vondra
Date: 06 January 2021, 17:16:38
Subject: Re: [PoC] Non-volatile WAL buffer

From: Magnus Hagander
Date: 06 January 2021, 17:27:48
Subject: Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

Re: Phrase search vs. multi-lexeme tokens - Mailing list pgsql-hackers

Previous

Next