Re: 9.6 phrase search distance specification - Mailing list pgsql-hackers

From Ryan Pedela
Subject Re: 9.6 phrase search distance specification
Date
Msg-id CACu89FR-6HW+77v6kSAwhjkjDDiafDDw_h7JPFOU6sztcRLY3g@mail.gmail.com
Whole thread Raw
In response to Re: 9.6 phrase search distance specification  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: 9.6 phrase search distance specification  (Ryan Pedela <rpedela@datalanche.com>)
Re: 9.6 phrase search distance specification  (Oleg Bartunov <obartunov@gmail.com>)
List pgsql-hackers


Thanks,

Ryan Pedela
Datalanche CEO, founder
www.datalanche.com

On Tue, Aug 9, 2016 at 11:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bruce Momjian <bruce@momjian.us> writes:
> Does anyone know why the phrase distance "<3>" was changed from "at most
> three tokens away" to "exactly three tokens away"?

So that it would correctly support phraseto_tsquery's use of the operator
to represent omitted words (stopwords) in a phrase.

I think there's probably some use in also providing an operator that does
"at most this many tokens away", but Oleg/Teodor were evidently less
excited, because they didn't take the time to do it.

The thread where this change was discussed is

https://www.postgresql.org/message-id/flat/c19fcfec308e6ccd952cdde9e648b505%40mail.gmail.com

see particularly

https://www.postgresql.org/message-id/11252.1465422251%40sss.pgh.pa.us

 I would say that it is worth it to have a "phrase slop" operator (Apache Lucene terminology). Proximity search is extremely useful for improving relevance and phrase slop is one of the tools to achieve that.

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: 9.6 phrase search distance specification
Next
From: Ryan Pedela
Date:
Subject: Re: 9.6 phrase search distance specification