Thread: Range phrase operator in tsquery

Range phrase operator in tsquery

From
Aleksandr Parfenov
Date:
Hello hackers,

Nowadays, phrase operator in Postgres FTS supports only exact match of
the distance between two words. It is sufficient for a search of
simple/exact phrases, but in some cases exact distance is unknown and we
want to words be close enough. E.g. it may help to search phrases with
additional words in the middle of the phrase ("long, narrow, plastic
brush" vs "long brush")

Proposed patch adds ability to use ranges in phrase operator for
mentioned cases. Few examples:

'term1 <4,10> term2'::tsquery -- Distance between term1 and term2 is
-- at least 4 and no greater than 10
'term1 <,10> term2'::tsquery  -- Distance between term1 and term2 is
-- no greater than 10
'term1 <4,> term2'::tsquery   -- Distance between term1 and term2 is
-- at least 4

In addition, negative distance is supported and means reverse order of
the words. For example:
'term1 <4,10> term2'::tsquery = 'term2 <-10,-4> term1'::tsquery
'term1 <,10> term2'::tsquery = 'term2 <-10,> term1'::tsquery
'term1 <4,> term2'::tsquery = 'term2 <,-4> term1'::tsquery

Negative distance support introduced to use it for AROUND operator
mentioned in websearch_to_tsquery[1]. In web search query language
AROUND(N) does a search for words within given distance N in
both forward and backward direction and it can be represented as <-N,N>
range phrase operator.

[1]
https://www.postgresql.org/message-id/flat/fe931111ff7e9ad79196486ada79e268@postgrespro.ru

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
Attachment

Re: Range phrase operator in tsquery

From
Aleksandr Parfenov
Date:
Hello hackers,

Updated version of the patch in the attachment.

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment

Re: Range phrase operator in tsquery

From
Dmitry Dolgov
Date:
> On Fri, 27 Apr 2018 at 13:03, Aleksandr Parfenov <a.parfenov@postgrespro.ru> wrote:
>
> Nowadays, phrase operator in Postgres FTS supports only exact match of
> the distance between two words. It is sufficient for a search of
> simple/exact phrases, but in some cases exact distance is unknown and we
> want to words be close enough. E.g. it may help to search phrases with
> additional words in the middle of the phrase

Hi,

Thank you for the patch, it looks like a nice feature. Few questions:

+ if (!distance_from_set)
+ {
+ distance_from = distance_to < 0 ? MINENTRYPOS : 0;
+ }
+ if (!distance_to_set)
+ {
+ distance_to = distance_from < 0 ? 0 : MAXENTRYPOS;
+ }

Why use 0 here instead of MAXENTRYPOS/MINENTRYPOS ? It looks a bit strange:

SELECT 'a <,-1000> b'::tsquery;
        tsquery
------------------------
 'a' <-16384,-1000> 'b'
(1 row)

SELECT 'a <,1000> b'::tsquery;
     tsquery
------------------
 'a' <0,1000> 'b'
(1 row)

Also I wonder why after introducing MINENTRYPOS the LIMITPOS wasn't changed?

#define LIMITPOS(x) ( ( (x) >= MAXENTRYPOS ) ? (MAXENTRYPOS-1) : (x) )


Re: Range phrase operator in tsquery

From
Dmitry Dolgov
Date:
> On Thu, Nov 15, 2018 at 11:15 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
> > On Fri, 27 Apr 2018 at 13:03, Aleksandr Parfenov <a.parfenov@postgrespro.ru> wrote:
> >
> > Nowadays, phrase operator in Postgres FTS supports only exact match of
> > the distance between two words. It is sufficient for a search of
> > simple/exact phrases, but in some cases exact distance is unknown and we
> > want to words be close enough. E.g. it may help to search phrases with
> > additional words in the middle of the phrase
>
> Hi,
>
> Thank you for the patch, it looks like a nice feature. Few questions:
>
> + if (!distance_from_set)
> + {
> + distance_from = distance_to < 0 ? MINENTRYPOS : 0;
> + }
> + if (!distance_to_set)
> + {
> + distance_to = distance_from < 0 ? 0 : MAXENTRYPOS;
> + }
>
> Why use 0 here instead of MAXENTRYPOS/MINENTRYPOS ? It looks a bit strange:
>
> SELECT 'a <,-1000> b'::tsquery;
>         tsquery
> ------------------------
>  'a' <-16384,-1000> 'b'
> (1 row)
>
> SELECT 'a <,1000> b'::tsquery;
>      tsquery
> ------------------
>  'a' <0,1000> 'b'
> (1 row)
>
> Also I wonder why after introducing MINENTRYPOS the LIMITPOS wasn't changed?
>
> #define LIMITPOS(x) ( ( (x) >= MAXENTRYPOS ) ? (MAXENTRYPOS-1) : (x) )

Due to lack of response I'm marking this as returned with feedback.