Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ? - Mailing list pgsql-hackers

From Oleg Bartunov
Subject Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
Date
Msg-id CAF4Au4wkjS6D2dG9Z1_VFJ95zojhwpVvkY4JGq6W-BwL3+tJyQ@mail.gmail.com
Whole thread Raw
In response to Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, Jun 8, 2016 at 1:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes:
>> I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
>> matching consecutive words but it won't work for us if it cannot handle
>> consecutive *duplicate* words.
>
>> For example, the following returns true:    select
>> phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');
>
>> Is this expected ?
>
> I concur that that seems like a rather useless behavior.  If we have
> "x <-> y" it is not possible to match at distance zero, while if we
> have "x <-> x" it seems unlikely that the user is expecting us to
> treat that identically to "x".  So phrase search simply should not
> consider distance-zero matches.

what's about word with several infinitives

select to_tsvector('en', 'leavings');     to_tsvector
------------------------'leave':1 'leavings':1
(1 row)

select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;?column?
----------t
(1 row)


>
> The attached one-liner patch seems to fix this problem, though I am
> uncertain whether any other places need to be changed to match.
> Also, there is a regression test case that changes:
>
> *** /home/postgres/pgsql/src/test/regress/expected/tstypes.out  Thu May  5 19:21:17 2016
> --- /home/postgres/pgsql/src/test/regress/results/tstypes.out   Tue Jun  7 17:55:41 2016
> ***************
> *** 897,903 ****
>   SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
>    ts_rank_cd
>   ------------
> !   0.0714286
>   (1 row)
>
>   SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
> --- 897,903 ----
>   SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
>    ts_rank_cd
>   ------------
> !           0
>   (1 row)
>
>   SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
>
>
> I'm not sure if this case is intentionally exhibiting the behavior that
> both parts of "s:* <-> sa:A" can be matched to the same lexeme, or if the
> result simply wasn't thought about carefully.
>
>                         regards, tom lane
>



pgsql-hackers by date:

Previous
From: Oleg Bartunov
Date:
Subject: Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
Next
From: Bruce Momjian
Date:
Subject: Re: Use of index for 50% column restriction