Thread: Range phrase operator in tsquery
Hello hackers, Nowadays, phrase operator in Postgres FTS supports only exact match of the distance between two words. It is sufficient for a search of simple/exact phrases, but in some cases exact distance is unknown and we want to words be close enough. E.g. it may help to search phrases with additional words in the middle of the phrase ("long, narrow, plastic brush" vs "long brush") Proposed patch adds ability to use ranges in phrase operator for mentioned cases. Few examples: 'term1 <4,10> term2'::tsquery -- Distance between term1 and term2 is -- at least 4 and no greater than 10 'term1 <,10> term2'::tsquery -- Distance between term1 and term2 is -- no greater than 10 'term1 <4,> term2'::tsquery -- Distance between term1 and term2 is -- at least 4 In addition, negative distance is supported and means reverse order of the words. For example: 'term1 <4,10> term2'::tsquery = 'term2 <-10,-4> term1'::tsquery 'term1 <,10> term2'::tsquery = 'term2 <-10,> term1'::tsquery 'term1 <4,> term2'::tsquery = 'term2 <,-4> term1'::tsquery Negative distance support introduced to use it for AROUND operator mentioned in websearch_to_tsquery[1]. In web search query language AROUND(N) does a search for words within given distance N in both forward and backward direction and it can be represented as <-N,N> range phrase operator. [1] https://www.postgresql.org/message-id/flat/fe931111ff7e9ad79196486ada79e268@postgrespro.ru -- Aleksandr Parfenov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Attachment
Hello hackers, Updated version of the patch in the attachment. -- Aleksandr Parfenov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Attachment
> On Fri, 27 Apr 2018 at 13:03, Aleksandr Parfenov <a.parfenov@postgrespro.ru> wrote: > > Nowadays, phrase operator in Postgres FTS supports only exact match of > the distance between two words. It is sufficient for a search of > simple/exact phrases, but in some cases exact distance is unknown and we > want to words be close enough. E.g. it may help to search phrases with > additional words in the middle of the phrase Hi, Thank you for the patch, it looks like a nice feature. Few questions: + if (!distance_from_set) + { + distance_from = distance_to < 0 ? MINENTRYPOS : 0; + } + if (!distance_to_set) + { + distance_to = distance_from < 0 ? 0 : MAXENTRYPOS; + } Why use 0 here instead of MAXENTRYPOS/MINENTRYPOS ? It looks a bit strange: SELECT 'a <,-1000> b'::tsquery; tsquery ------------------------ 'a' <-16384,-1000> 'b' (1 row) SELECT 'a <,1000> b'::tsquery; tsquery ------------------ 'a' <0,1000> 'b' (1 row) Also I wonder why after introducing MINENTRYPOS the LIMITPOS wasn't changed? #define LIMITPOS(x) ( ( (x) >= MAXENTRYPOS ) ? (MAXENTRYPOS-1) : (x) )
> On Thu, Nov 15, 2018 at 11:15 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > On Fri, 27 Apr 2018 at 13:03, Aleksandr Parfenov <a.parfenov@postgrespro.ru> wrote: > > > > Nowadays, phrase operator in Postgres FTS supports only exact match of > > the distance between two words. It is sufficient for a search of > > simple/exact phrases, but in some cases exact distance is unknown and we > > want to words be close enough. E.g. it may help to search phrases with > > additional words in the middle of the phrase > > Hi, > > Thank you for the patch, it looks like a nice feature. Few questions: > > + if (!distance_from_set) > + { > + distance_from = distance_to < 0 ? MINENTRYPOS : 0; > + } > + if (!distance_to_set) > + { > + distance_to = distance_from < 0 ? 0 : MAXENTRYPOS; > + } > > Why use 0 here instead of MAXENTRYPOS/MINENTRYPOS ? It looks a bit strange: > > SELECT 'a <,-1000> b'::tsquery; > tsquery > ------------------------ > 'a' <-16384,-1000> 'b' > (1 row) > > SELECT 'a <,1000> b'::tsquery; > tsquery > ------------------ > 'a' <0,1000> 'b' > (1 row) > > Also I wonder why after introducing MINENTRYPOS the LIMITPOS wasn't changed? > > #define LIMITPOS(x) ( ( (x) >= MAXENTRYPOS ) ? (MAXENTRYPOS-1) : (x) ) Due to lack of response I'm marking this as returned with feedback.