Re: Phrase search vs. multi-lexeme tokens - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Phrase search vs. multi-lexeme tokens
Date
Msg-id CAPpHfdsH+fHfjE5hKqpCc34hTXz2YW5qKe-oVS3LWraVgpR2mA@mail.gmail.com
Whole thread Raw
In response to Re: Phrase search vs. multi-lexeme tokens  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Phrase search vs. multi-lexeme tokens  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
Hi!

On Wed, Jan 6, 2021 at 8:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <aekorotkov@gmail.com> writes:
> > # select to_tsvector('pg_class foo') @@ websearch_to_tsquery('"pg_class foo"');
> >  ?column?
> > ----------
> >  f
>
> Yeah, surely this is wrong.

Thank you for confirming my thoughts.  I also felt that is wrong but
doubted such a basic bug could exist for so long.

> > # select to_tsquery('pg_class <-> foo');
> >           to_tsquery
> > ------------------------------
> >  ( 'pg' & 'class' ) <-> 'foo'
>
> > I think if a user writes 'pg_class <-> foo', then it's expected to
> > match 'pg_class foo' independently on which lexemes 'pg_class' is
> > split into.
>
> Indeed.  It seems to me that this:
>
> regression=# select to_tsquery('pg_class');
>    to_tsquery
> ----------------
>  'pg' & 'class'
> (1 row)
>
> is wrong all by itself.  Now that we have phrase search, a much
> saner translation would be "'pg' <-> 'class'".  If we fixed that
> then it seems like the more complex case would just work.

Nice idea!  Fixing this way should be much easier than fixing only the
case when we have the phrase operator on the upper level.

> I read your patch over quickly and it seems like a reasonable
> approach (but sadly underdocumented).  Can we extend the idea
> to fix the to_tsquery case?

Sure, I'll provide a revised patch.

------
Regards,
Alexander Korotkov



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Terminate the idle sessions
Next
From: Tom Lane
Date:
Subject: Re: plpgsql variable assignment with union is broken