Thread: [HACKERS] new function for tsquery creartion

[HACKERS] new function for tsquery creartion

From

Victor Drobny

Date:

19 July 2017, 19:43:51

Dear all,

Now Postgres has a few functions to create tsqueries for full text 
search. The main one is the to_tsquery function that allows to make 
query with any operation. But to make correct query all of the operators 
should be specified explicitly. In order to make it easier postgres has 
functions like plainto_tsquery and phraseto_tsquery which allow to make 
tsqueries from strings. But they are not flexible enough.

Let me introduce new function for full text search query creation(which 
is called 'queryto_tsquery'). It takes 'google like' query string and 
translates it to tsquery.
The main features are the following:
All the text inside double quotes would be treated as a phrase("a b c" 
-> 'a <-> b  <-> c')
New operator AROUND(N). It matches if the distance between words(or 
maybe phrases) is less than or equal to N.
Alias for !('-rat' is the same as '!rat')
Alias for |('dog OR cat' is the same as 'dog | cat')

As a plainto_tsquery and phraseto_tsquery it will fill operators by 
itself, but already placed operations won't be ignored. It allows to 
combine two approaches.

In the attachment you can find patch with the new features, tests and 
documentation for it.
What do you think about it?

Thank you very much for the attention!

-- 
------
Victor Drobny
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

queryto_tsquery.patch

Re: [HACKERS] new function for tsquery creartion

From

Robert Haas

Date:

19 July 2017, 19:58:05

On Wed, Jul 19, 2017 at 12:43 PM, Victor Drobny <v.drobny@postgrespro.ru> wrote:
> Let me introduce new function for full text search query creation(which is
> called 'queryto_tsquery'). It takes 'google like' query string and
> translates it to tsquery.

I haven't looked at the code, but that sounds like a neat idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] new function for tsquery creartion

From

Thomas Munro

Date:

09 September 2017, 06:03:02

On Thu, Jul 20, 2017 at 4:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jul 19, 2017 at 12:43 PM, Victor Drobny <v.drobny@postgrespro.ru> wrote:
>> Let me introduce new function for full text search query creation(which is
>> called 'queryto_tsquery'). It takes 'google like' query string and
>> translates it to tsquery.
>
> I haven't looked at the code, but that sounds like a neat idea.

+1

This is a very cool feature making tsquery much more accessible.  Many
people know that sort of defacto search engine query language that
many websites accept using quotes, AND, OR, - etc.

Calling this search syntax just "query" seems too general and
overloaded.  "Simple search", "simple query", "web search", "web
syntax", "web query", "Google-style query", "Poogle" (kidding!) ...
well I'm not sure, but I feel like it deserves a proper name.
websearch_to_tsquery()?

I see that your AROUND(n) is an undocumented Google search syntax.
That's a good trick to know.

Please send a rebased version of the patch for people to review and
test as that one has bit-rotted.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new function for tsquery creartion

From

Victor Drobny

Date:

13 September 2017, 11:57:20

On 2017-09-09 06:03, Thomas Munro wrote:
> Please send a rebased version of the patch for people to review and
> test as that one has bit-rotted.
Hello,
Thank you for interest. In the attachment you can find rebased
version(based on 69835bc8988812c960f4ed5aeee86b62ac73602a commit)
-- 
Victor Drobny
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

queryto_tsquery2.patch

Re: [HACKERS] new function for tsquery creartion

From

Alexey Chernyshov

Date:

13 October 2017, 16:37:32

Hi all,
I am extending phrase operator <n> is such way that it will have <n,m>
syntax that means from n to m words, so I will use such syntax (<n,m>)
further. I found that a AROUND(N) b is exactly the same as a <-N,N> b
and it can be replaced while parsing. So, what do you think of such
idea? In this patch I have noticed some unobvious behavior.

# select to_tsvector('Hello, cat world!') @@ queryto_tsquery('cat
AROUND(1) cat') as match;
match 
-------t

cat AROUND(1) cat is the same is "cat <1> cat || cat <0> cat" and:

# select to_tsvector('Hello, cat world!') @@ to_tsquery('cat <0> cat');?column? 
-------t

It seems to be a proper logic behavior but it is a possible pitfall,
maybe it should be documented?

But more important question is how AROUND() operator should handle stop
words? Now it works as:

# select queryto_tsquery('cat <2> (a AROUND(10) rat)');queryto_tsquery  
------------------'cat' <12> 'rat'
(1 row)

# select queryto_tsquery('cat <2> a AROUND(10) rat');   queryto_tsquery     
------------------------'cat' AROUND(12) 'rat'
(1 row)

In my opinion it should be like:
cat <2> (a AROUND(10) rat) == cat <2,2> (a <-10,10> rat) == cat <-8,12>
rat 

cat <2> a AROUND(10) rat == cat <2,2> a <-10,10> rat = cat <-8, 12>
rat

Now <n,m> operator can be replaced with combination of phrase
operator <n>, AROUND(), and logical operators, but with <n,m> operator
it will be much painless. Correct me, please, if I am wrong.

-- 
Alexey Chernyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new function for tsquery creartion

From

Victor Drobny

Date:

13 October 2017, 16:55:12

On 2017-10-13 16:37, Alexey Chernyshov wrote:
> Hi all,
> I am extending phrase operator <n> is such way that it will have <n,m>
> syntax that means from n to m words, so I will use such syntax (<n,m>)
> further. I found that a AROUND(N) b is exactly the same as a <-N,N> b
> and it can be replaced while parsing. So, what do you think of such
> idea? In this patch I have noticed some unobvious behavior.

Thank you for the interest and review!

> # select to_tsvector('Hello, cat world!') @@ queryto_tsquery('cat
> AROUND(1) cat') as match;
> match
> -------
>  t
> 
> cat AROUND(1) cat is the same is "cat <1> cat || cat <0> cat" and:
> 
> # select to_tsvector('Hello, cat world!') @@ to_tsquery('cat <0> cat');
>  ?column?
> -------
>  t
> 
> It seems to be a proper logic behavior but it is a possible pitfall,
> maybe it should be documented?

It is a tricky question. I think that this interpretation is confusing, 
so
better to make it as <-N, -1> and <1, N>.

> But more important question is how AROUND() operator should handle stop
> words? Now it works as:
> 
> # select queryto_tsquery('cat <2> (a AROUND(10) rat)');
>  queryto_tsquery
> ------------------
>  'cat' <12> 'rat'
> (1 row)
> 
> # select queryto_tsquery('cat <2> a AROUND(10) rat');
>     queryto_tsquery
> ------------------------
>  'cat' AROUND(12) 'rat'
> (1 row)
> 
> In my opinion it should be like:
> cat <2> (a AROUND(10) rat) == cat <2,2> (a <-10,10> rat) == cat <-8,12>
> rat

I think that correct version is:
cat <2> (a AROUND(10) rat) == cat <2,2> (a <-10,10> rat) == cat <-2,12> 
rat.

> cat <2> a AROUND(10) rat == cat <2,2> a <-10,10> rat = cat <-8, 12>
> rat

It is a problem indeed. I did not catch it during implementation. Thank 
you
for pointing it out.

> Now <n,m> operator can be replaced with combination of phrase
> operator <n>, AROUND(), and logical operators, but with <n,m> operator
> it will be much painless. Correct me, please, if I am wrong.

I think that <n,m> operator is more general than around(n) so the last 
one
should be based on yours. However, i think, that taking negative 
parameters
is not the best idea because it is confusing. On top of that it is not 
so
necessary and i think it won`t be popular among users.
It seems to me that AROUND operator can be easily implemented with 
<n,m>,
also, it helps to avoid problems, that you showed above.

-- 
Victor Drobny
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new function for tsquery creartion

From

Tomas Vondra

Date:

19 November 2017, 04:30:57

Hi,

On 09/13/2017 10:57 AM, Victor Drobny wrote:
> On 2017-09-09 06:03, Thomas Munro wrote:
>> Please send a rebased version of the patch for people to review and
>> test as that one has bit-rotted.
> Hello,
> Thank you for interest. In the attachment you can find rebased
> version(based on 69835bc8988812c960f4ed5aeee86b62ac73602a commit)
> 

I did a quick review of the patch today. The patch unfortunately no
longer applies, so I had to use an older commit from September. Please
rebase to current master.

I've only looked on the diff at this point, will do more testing once
the rebase happens.

Some comments:

1) This seems to mix multiple improvements into one patch. There's the
support for alternative query syntax, and then there are the new
operators (AROUND and <m,n>). I propose to split the patch into two or
more parts, each addressing one of those bits.

I guess there will be two or three parts - first adding the syntax,
second adding <m,n> and third adding the AROUND(n). Seems reasonable?

2) I don't think we should mention Google in the docs explicitly. Not
that I'm somehow anti-google, but this syntax was certainly not invented
by Google - I vividly remember using something like that on Altavista
(yeah, I'm old). And it's used by pretty much every other web search
engine out there ...

3) In the SGML docs, please use <literal></literal> instead of just
quoting the values. So it should be <literal>|</literal> instead of '|'
etc. Just like in the parts describing plainto_tsquery, for example.

4) Also, I recommend adding a brief explanation what the examples do.
Right now there's just a bunch of queryto_tsquery, and the reader is
expected to understand the output. I suggest adding a sentence or two,
explaining what's happening (just like for plainto_tsquery examples).

5) I'm not sure about negative values in the <n,m> operator. I don't
find it particularly confusing - once you understand that (a <n,m> b)
means "there are 'k' words between 'a' and 'b' (n <= k <= m)", then
negative values seem like a fairly straightforward extension.

But I guess the main question is "Is there really a demand for the new
<n,m> operator, or have we just invented if because we can?"

6) There seem to be some new constants defined, with names not following
the naming convention. I mean this
   #define WAITOPERAND            1   #define WAITOPERATOR        2   #define WAITFIRSTOPERAND    3   #define
WAITSINGLEOPERAND   4   #define INSIDEQUOTES        5   <-- the new one

and
   #define TSPO_L_ONLY            0x01   #define TSPO_R_ONLY            0x02   #define TSPO_BOTH              0x04
#defineTS_NOT_EXAC            0x08 <-- the new one

Perhaps that's OK, but it seems a bit inconsistent.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] new function for tsquery creartion

From

Victor Drobny

Date:

28 November 2017, 09:56:10

On 2017-11-19 04:30, Tomas Vondra wrote:
Hello,
> Hi,
> 
> On 09/13/2017 10:57 AM, Victor Drobny wrote:
>> On 2017-09-09 06:03, Thomas Munro wrote:
>>> Please send a rebased version of the patch for people to review and
>>> test as that one has bit-rotted.
>> Hello,
>> Thank you for interest. In the attachment you can find rebased
>> version(based on 69835bc8988812c960f4ed5aeee86b62ac73602a commit)
>> 
> 
> I did a quick review of the patch today. The patch unfortunately no
> longer applies, so I had to use an older commit from September. Please
> rebase to current master.

Thank you for your time. In the attachment you can find rebased version.
(based on e842791b0 commit)

> I've only looked on the diff at this point, will do more testing once
> the rebase happens.
> 
> Some comments:
> 
> 1) This seems to mix multiple improvements into one patch. There's the
> support for alternative query syntax, and then there are the new
> operators (AROUND and <m,n>). I propose to split the patch into two or
> more parts, each addressing one of those bits.

I agree. I have split it in 3 parts: support for around operator,
queryto_tsquery function and documentation.

> I guess there will be two or three parts - first adding the syntax,
> second adding <m,n> and third adding the AROUND(n). Seems reasonable?
> 
> 2) I don't think we should mention Google in the docs explicitly. Not
> that I'm somehow anti-google, but this syntax was certainly not 
> invented
> by Google - I vividly remember using something like that on Altavista
> (yeah, I'm old). And it's used by pretty much every other web search
> engine out there ...

Yes, those syntax is not introduced by google, but, as for me, it was 
the
easiest way to give a brief description of it. Of cause it can be 
changed,
I just don't know how. Any suggestions are welcomed! ;)

> 3) In the SGML docs, please use <literal></literal> instead of just
> quoting the values. So it should be <literal>|</literal> instead of '|'
> etc. Just like in the parts describing plainto_tsquery, for example.

Fixed. I hope that i didn't miss anything.

> 4) Also, I recommend adding a brief explanation what the examples do.
> Right now there's just a bunch of queryto_tsquery, and the reader is
> expected to understand the output. I suggest adding a sentence or two,
> explaining what's happening (just like for plainto_tsquery examples).
> 
> 5) I'm not sure about negative values in the <n,m> operator. I don't
> find it particularly confusing - once you understand that (a <n,m> b)
> means "there are 'k' words between 'a' and 'b' (n <= k <= m)", then
> negative values seem like a fairly straightforward extension.
> 
> But I guess the main question is "Is there really a demand for the new
> <n,m> operator, or have we just invented if because we can?"

The operator <n,m> is not introduced yet. It's just a concept. It were 
our
thoughts about implementation AROUND operator through <n,m> in future.

> 6) There seem to be some new constants defined, with names not 
> following
> the naming convention. I mean this
> 
>     #define WAITOPERAND            1
>     #define WAITOPERATOR        2
>     #define WAITFIRSTOPERAND    3
>     #define WAITSINGLEOPERAND    4
>     #define INSIDEQUOTES        5   <-- the new one
> 
> and
> 
>     #define TSPO_L_ONLY            0x01
>     #define TSPO_R_ONLY            0x02
>     #define TSPO_BOTH              0x04
>     #define TS_NOT_EXAC            0x08 <-- the new one
> 
> Perhaps that's OK, but it seems a bit inconsistent.

I agree. I have fixed it.

> 
> regards

-- 
Victor Drobny
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: new function for tsquery creartion

From

Aleksander Alekseev

Date:

28 November 2017, 17:39:16

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            tested, passed

Hi Victor,

I like the idea and I think it's a great patch. However in current shape it
requires some amount of reworking to meet PostgreSQL standards of code quality.

Particularly:

1. Many new procedures don't have a comment with a brief description. Ideally
every procedure should have not only a brief description but also a description
of every argument, return value and changes of global state if applicable.

2. I believe you could implement the has_prefix procedure just as a wrapper of
strstr().

3. I suggest to use snprintf instead of sprintf in a new code whenever
possible, especially if you are using %s - just to be on a safe side.

4. I noticed that your code affects the catalog. Did you check that your
changes will not cause problems during the migration from the older version of
PostgreSQL to the never one?

5. Tests for queryto_tsquery use only ASCII strings. I suggest to add a few
test that use non-ASCII characters as well, and a few corner cases like empty
string, string that contains only the stop-words, etc.

6. `make check-world` doesn't pass:

```
***************
*** 1672,1678 **** (1 row)  set enable_seqscan = on;
-  --test queryto_tsquery function select queryto_tsquery('My brand new smartphone');         queryto_tsquery        
--- 1672,1677 ----
***************
*** 1784,1786 ****
--- 1783,1786 ---- ---------------------------  'fat-rat' & 'fat' & 'rat' (1 row)
+ 
```

Re: new function for tsquery creartion

From

Aleksander Alekseev

Date:

28 November 2017, 17:57:00

Hi Victor,

> I like the idea and I think it's a great patch. However in current shape it
> requires some amount of reworking to meet PostgreSQL standards of code quality.

Also I would like to add that I agree with Thomas Munro:

> Calling this search syntax just "query" seems too general and
> overloaded.  "Simple search", "simple query", "web search", "web
> syntax", "web query", "Google-style query", "Poogle" (kidding!) ...
> well I'm not sure, but I feel like it deserves a proper name.
> websearch_to_tsquery()?

websearch_to_tsquery() sounds much better than query_to_tsquery().

Also I agree Tomas Vondra in regard that:

> 2) I don't think we should mention Google in the docs explicitly. Not
> that I'm somehow anti-google, but this syntax was certainly not invented
> by Google - I vividly remember using something like that on Altavista
> (yeah, I'm old). And it's used by pretty much every other web search
> engine out there ...

I suggest to rephrase:

```
+    about its input. <function>queryto_tsquery</function> provides a
+    different, Google like syntax to create tsquery.
```

.. to something more like "provides a different syntax, similar to one
used in web search engines, to create tsqeury". And maybe give a few
examples right in the next sentence.

--
Best regards,
Aleksander Alekseev

Re: new function for tsquery creartion

From

Michael Paquier

Date:

29 November 2017, 07:41:52

On Tue, Nov 28, 2017 at 11:57 PM, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:
>> I like the idea and I think it's a great patch. However in current shape it
>> requires some amount of reworking to meet PostgreSQL standards of code quality.
>
> Also I would like to add that I agree with Thomas Munro:
>
>> Calling this search syntax just "query" seems too general and
>> overloaded.  "Simple search", "simple query", "web search", "web
>> syntax", "web query", "Google-style query", "Poogle" (kidding!) ...
>> well I'm not sure, but I feel like it deserves a proper name.
>> websearch_to_tsquery()?
>
> websearch_to_tsquery() sounds much better than query_to_tsquery().
>
> Also I agree Tomas Vondra in regard that:
>
>> 2) I don't think we should mention Google in the docs explicitly. Not
>> that I'm somehow anti-google, but this syntax was certainly not invented
>> by Google - I vividly remember using something like that on Altavista
>> (yeah, I'm old). And it's used by pretty much every other web search
>> engine out there ...
>
> I suggest to rephrase:
>
> ```
> +    about its input. <function>queryto_tsquery</function> provides a
> +    different, Google like syntax to create tsquery.
> ```
>
> .. to something more like "provides a different syntax, similar to one
> used in web search engines, to create tsqeury". And maybe give a few
> examples right in the next sentence.

The patch got a review less than 1 day ago, so I am moving it to next
CF with the same status, waiting on author.
-- 
Michael

Re: new function for tsquery creartion

From

Victor Drobny

Date:

29 November 2017, 17:56:30

On 2017-11-28 17:57, Aleksander Alekseev wrote:
Hi Aleksander,

Thank you for review. I have tried to fix all of your comments.
However i want to mention that the absence of comments for functions
in to_tsany.c is justified by the absence of comments for other
similar functions.
> Hi Victor,
> 
>> I like the idea and I think it's a great patch. However in current 
>> shape it
>> requires some amount of reworking to meet PostgreSQL standards of code 
>> quality.
> 
> Also I would like to add that I agree with Thomas Munro:
> 
>> Calling this search syntax just "query" seems too general and
>> overloaded.  "Simple search", "simple query", "web search", "web
>> syntax", "web query", "Google-style query", "Poogle" (kidding!) ...
>> well I'm not sure, but I feel like it deserves a proper name.
>> websearch_to_tsquery()?
> 
> websearch_to_tsquery() sounds much better than query_to_tsquery().
> 
> Also I agree Tomas Vondra in regard that:
> 
>> 2) I don't think we should mention Google in the docs explicitly. Not
>> that I'm somehow anti-google, but this syntax was certainly not 
>> invented
>> by Google - I vividly remember using something like that on Altavista
>> (yeah, I'm old). And it's used by pretty much every other web search
>> engine out there ...
> 
> I suggest to rephrase:
> 
> ```
> +    about its input. <function>queryto_tsquery</function> provides a
> +    different, Google like syntax to create tsquery.
> ```
> 
> .. to something more like "provides a different syntax, similar to one
> used in web search engines, to create tsqeury". And maybe give a few
> examples right in the next sentence.
Best,

-- 
Victor Drobny
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: new function for tsquery creartion

From

Victor Drobny

Date:

29 November 2017, 17:57:33

On 2017-11-29 17:56, Victor Drobny wrote:

Sorry, forgot to attach new version of the patch.
> On 2017-11-28 17:57, Aleksander Alekseev wrote:
> Hi Aleksander,
> 
> Thank you for review. I have tried to fix all of your comments.
> However i want to mention that the absence of comments for functions
> in to_tsany.c is justified by the absence of comments for other
> similar functions.
>> Hi Victor,
>> 
>>> I like the idea and I think it's a great patch. However in current 
>>> shape it
>>> requires some amount of reworking to meet PostgreSQL standards of 
>>> code quality.
>> 
>> Also I would like to add that I agree with Thomas Munro:
>> 
>>> Calling this search syntax just "query" seems too general and
>>> overloaded.  "Simple search", "simple query", "web search", "web
>>> syntax", "web query", "Google-style query", "Poogle" (kidding!) ...
>>> well I'm not sure, but I feel like it deserves a proper name.
>>> websearch_to_tsquery()?
>> 
>> websearch_to_tsquery() sounds much better than query_to_tsquery().
>> 
>> Also I agree Tomas Vondra in regard that:
>> 
>>> 2) I don't think we should mention Google in the docs explicitly. Not
>>> that I'm somehow anti-google, but this syntax was certainly not 
>>> invented
>>> by Google - I vividly remember using something like that on Altavista
>>> (yeah, I'm old). And it's used by pretty much every other web search
>>> engine out there ...
>> 
>> I suggest to rephrase:
>> 
>> ```
>> +    about its input. <function>queryto_tsquery</function> provides a
>> +    different, Google like syntax to create tsquery.
>> ```
>> 
>> .. to something more like "provides a different syntax, similar to one
>> used in web search engines, to create tsqeury". And maybe give a few
>> examples right in the next sentence.
> Best,

-- 
Victor Drobny
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: new function for tsquery creartion

From

Aleksander Alekseev

Date:

15 January 2018, 16:00:55

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            tested, passed

Here are a few minor issues:

```
+/*
+ * Checks if 'str' starts with a 'prefix'
+ */
+static bool
+has_prefix(char * str, char * prefix)
+{
+   if (strlen(prefix) > strlen(str))
+   {
+       return false;
+   }
+   return strstr(str, prefix) == str;
+}
```

strlen() check is redundant.

```
+           case OP_AROUND:
+               snprintf(in->cur, 256, " AROUND(%d) %s", distance, nrm.buf);
+               break;
```

Instead of the constant 256 it's better to use sizeof().

Apart from these issues this patch looks not bad.

The new status of this patch is: Ready for Committer

Re: new function for tsquery creartion

From

Andres Freund

Date:

02 March 2018, 06:06:05

On 2017-11-29 17:56:30 +0300, Victor Drobny wrote:
> Thank you for review. I have tried to fix all of your comments.
> However i want to mention that the absence of comments for functions
> in to_tsany.c is justified by the absence of comments for other
> similar functions.

That's not justification. Tsquery related code is notorious for being
badly commented, we do not want to continue that.

Greetings,

Andres Freund

Re: new function for tsquery creartion

From

Aleksander Alekseev

Date:

05 March 2018, 15:52:19

It seems that this patch doesn't apply anymore, see http://commitfest.cputube.org/

The new status of this patch is: Waiting on Author

Re: Re: new function for tsquery creartion

From

David Steele

Date:

21 March 2018, 17:24:54

Hi Victor,

On 3/5/18 7:52 AM, Aleksander Alekseev wrote:
> It seems that this patch doesn't apply anymore, see http://commitfest.cputube.org/
> 
> The new status of this patch is: Waiting on Author

This patch needs a rebase and should address the comments from
Aleksander and Andres.  We are now three weeks into the CF with no new
patch.

Are you planning to provide a new patch?  If not, I think it should be
marked as Returned with Feedback and submitted to the next CF once it
has been updated.

Regards,
-- 
-David
david@pgmasters.net

Re: new function for tsquery creartion

From

Dmitry Ivanov

Date:

22 March 2018, 16:53:15

Hi David,

I'd like to take over from Victor. I'll post a revised version of the 
patch in a couple of days.

-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] new function for tsquery creartion

From

Teodor Sigaev

Date:

22 March 2018, 17:38:04

> I am extending phrase operator <n> is such way that it will have <n,m>
> syntax that means from n to m words, so I will use such syntax (<n,m>)
> further. I found that a AROUND(N) b is exactly the same as a <-N,N> b
> and it can be replaced while parsing. So, what do you think of such
> idea? In this patch I have noticed some unobvious behavior.

I think new operator should be a subject for separate patch. And I prefer idea 
about range phrase operator.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: new function for tsquery creartion

From

Aleksandr Parfenov

Date:

26 March 2018, 09:24:05

On Thu, 22 Mar 2018 16:53:15 +0300
Dmitry Ivanov <d.ivanov@postgrespro.ru> wrote:

> Hi David,
> 
> I'd like to take over from Victor. I'll post a revised version of the 
> patch in a couple of days.
> 

Hi Dmitry,

Recently I worked with the old version of the patch and found a bug.
So, I think it is better to notify you immediately, so you can fix it in
rebased/revised version.

I noticed, that operator AROUND(N) works only
in case of non-negative operands. If any of the operands is negative, it
behaves as phrase operator <N>. It is caused by lack of TS_NOT_EXAC
flag and AROUND(N) operator check in function TS_phrase_execute in
branches for negated operands.

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: new function for tsquery creartion

From

Dmitry Ivanov

Date:

26 March 2018, 11:51:42

> Recently I worked with the old version of the patch and found a bug.
> So, I think it is better to notify you immediately, so you can fix it 
> in
> rebased/revised version.
> 
> I noticed, that operator AROUND(N) works only
> in case of non-negative operands. If any of the operands is negative, 
> it
> behaves as phrase operator <N>. It is caused by lack of TS_NOT_EXAC
> flag and AROUND(N) operator check in function TS_phrase_execute in
> branches for negated operands.

Good to know, thanks! To be honest, I' sure that Theodor is right: it's 
better to implement AROUND(N) operator using <N, M> when it's committed. 
The following version of patch won't support AROUND(N). I have to fix a 
few more questionable things, though.

-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: new function for tsquery creartion

From

Thomas Munro

Date:

26 March 2018, 14:32:05

On Mon, Mar 26, 2018 at 9:51 PM, Dmitry Ivanov <d.ivanov@postgrespro.ru> wrote:
>> Recently I worked with the old version of the patch and found a bug.
>> So, I think it is better to notify you immediately, so you can fix it in
>> rebased/revised version.
>>
>> I noticed, that operator AROUND(N) works only
>> in case of non-negative operands. If any of the operands is negative, it
>> behaves as phrase operator <N>. It is caused by lack of TS_NOT_EXAC
>> flag and AROUND(N) operator check in function TS_phrase_execute in
>> branches for negated operands.
>
> Good to know, thanks! To be honest, I' sure that Theodor is right: it's
> better to implement AROUND(N) operator using <N, M> when it's committed. The
> following version of patch won't support AROUND(N). I have to fix a few more
> questionable things, though.

Hi,

I took a quick look at the language in the last version of the patches.

Patch 01:

+ errmsg("Invalid AROUND(X) operator!")));

s/I/i/;s/!//

+ errmsg("Missing ')' in AROUND(X) operator")));

s/M/m/

Patch 03 (the documentation) needed some proof-reading.  I've attached
a new version of that patch with some small suggested improvements.

Questions I had while reading the documentation without looking at the
code:  Is there anything to_tsquery() can do that
websearch_to_tsquery() can't?  The documentation doesn't mention
parentheses, but I can see that they are in fact supported from the
regression test.  Would it be OK to use user-supplied websearch
strings?  Ie can it produce a syntax error?  Well clearly it can, see
above -- perhaps that should be explicitly documented?  Is there any
way to write OR as a term (that's a valuable non-stopword in French)?

It seems like AROUND(x) should be documented also more generally for
tsquery, but I see there is some discussion about how that should
look.

By the way, not this patch's fault, but I noticed that commit
f5f1355dc4d did this:

-                               (errmsg("query contains only
stopword(s) or doesn't contain lexeme(s), ignored")));
+                               (errmsg("text-search query contains
only stop words or doesn't contain lexemes, ignored")));

But the old test still appears in an example in doc/src/sgml/textsearch.sgml.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

docs.patch

Re: new function for tsquery creartion

From

Dmitry Ivanov

Date:

26 March 2018, 19:48:19

> Patch 03 (the documentation) needed some proof-reading.  I've attached
> a new version of that patch with some small suggested improvements.

Thanks, I'm definitely going to use this.

> Is there anything to_tsquery() can do that websearch_to_tsquery() 
> can't?

Currently, no.

> Would it be OK to use user-supplied websearch strings?
> Ie can it produce a syntax error?

I believe that's the most important question. After a private discussion 
with Theodor I came to a conclusion that the most beneficial outcome 
would be to suppress all syntax errors and give user some result, 
cutting all misused operators along the way. This requires some changes, 
though.

> Is there any way to write OR as a term (that's a valuable non-stopword 
> in French)?

You could quote it like this: websearch_to_tsquery('"or"');

Moreover, it's still possible to use & and |.

> It seems like AROUND(x) should be documented also more generally for
> tsquery, but I see there is some discussion about how that should
> look.

Personally, I like <N, M> operator better. It would instantly deprecate 
AROUND(N), which is why I'm going to drop it.

> By the way, not this patch's fault, but I noticed that commit
> f5f1355dc4d did this:
> 
> -                               (errmsg("query contains only
> stopword(s) or doesn't contain lexeme(s), ignored")));
> +                               (errmsg("text-search query contains
> only stop words or doesn't contain lexemes, ignored")));
> 
> But the old test still appears in an example in 
> doc/src/sgml/textsearch.sgml.

Will fix this.

-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: new function for tsquery creartion

From

Dmitry Ivanov

Date:

28 March 2018, 01:29:48

Hi everyone,

I'd like to share some intermediate results. Here's what has changed:


1. OR operator is now case-insensitive. Moreover, trailing whitespace is 
no longer used to identify it:

select websearch_to_tsquery('simple', 'abc or');
  websearch_to_tsquery
----------------------
  'abc' & 'or'
(1 row)

select websearch_to_tsquery('simple', 'abc or(def)');
  websearch_to_tsquery
----------------------
  'abc' | 'def'
(1 row)

select websearch_to_tsquery('simple', 'abc or!def');
  websearch_to_tsquery
----------------------
  'abc' | 'def'
(1 row)


2. AROUND(N) has been dropped. I hope that <N, M> operator will allow us 
to implement it with a few lines of code.

3. websearch_to_tsquery() now tolerates various syntax errors, for 
instance:

Misused operators:

'abc &'
'| abc'
'<- def'

Missing parentheses:

'abc & (def <-> (cat or rat'

Other sorts of nonsense:

'abc &--|| def'  =>  'abc' & !!'def'
'abc:def'  =>  'abc':D & 'ef'

This, however, doesn't mean that the result will always be adequate (who 
would have thought?). Overall, current implementation follows the GIGO 
principle. In theory, this would allow us to use user-supplied websearch 
strings (but see gotchas), even if they don't make much sense. Better 
then nothing, right?

4. A small refactoring: I've replaced all WAIT* macros with a enum for 
better debugging (names look much nicer in GDB). Hope this is 
acceptable.

5. Finally, I've added a few more comments and tests. I haven't checked 
the code coverage, though.


A few gotchas:

I haven't touched gettoken_tsvector() yet. As a result, the following 
queries produce errors:

select websearch_to_tsquery('simple', '''');
ERROR:  syntax error in tsquery: "'"

select websearch_to_tsquery('simple', '\');
ERROR:  there is no escaped character: "\"

Maybe there's more. The question is: should we fix those, or it's fine 
as it is? I don't have a strong opinion about this.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

websearch_to_tsquery_v1.diff

Re: new function for tsquery creartion

From

Dmitry Ivanov

Date:

28 March 2018, 01:46:31

> select websearch_to_tsquery('simple', 'abc or!def');
>  websearch_to_tsquery
> ----------------------
>  'abc' | 'def'
> (1 row)

This is wrong ofc, I've attached the fixed version.

-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

websearch_to_tsquery_v2.diff

Re: new function for tsquery creartion

From

Aleksander Alekseev

Date:

28 March 2018, 12:21:15

Hello Dmitry,

> A few gotchas:
>
> I haven't touched gettoken_tsvector() yet. As a result, the following
> queries produce errors:
>
> select websearch_to_tsquery('simple', '''');
> ERROR:  syntax error in tsquery: "'"
>
> select websearch_to_tsquery('simple', '\');
> ERROR:  there is no escaped character: "\"
>
> Maybe there's more. The question is: should we fix those, or it's fine as it
> is? I don't have a strong opinion about this.

It doesn't sound right to me to accept any input as a general rule but
sometimes return errors nevertheless. That API would be complicated for
the users. Thus I suggest to accept any garbage and try our best to
interpret it.

--
Best regards,
Aleksander Alekseev

Attachment

signature.asc

Re: new function for tsquery creartion

From

Aleksandr Parfenov

Date:

01 April 2018, 15:51:03

Hello hackers,

On 2018-03-28 12:21, Aleksander Alekseev wrote:
> It doesn't sound right to me to accept any input as a general rule but
> sometimes return errors nevertheless. That API would be complicated for
> the users. Thus I suggest to accept any garbage and try our best to
> interpret it.

I agree with Aleksander about silencing all errors in 
websearch_to_tsquery().

In the attachment is a revised patch with the attempt to introduce an 
ability to ignore syntax errors in gettoken_tsvector().
I'm also read through the patch and all the code looks good to me except 
one thing.
The name of enum ts_parsestate looks more like a name of the function 
than a name of a type.
In my version, it renamed to QueryParserState, but you can fix it if I'm 
wrong.

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment

websearch_to_tsquery_v3.diff

Re: new function for tsquery creartion

From

Dmitry Ivanov

Date:

01 April 2018, 23:00:43

Hi Aleksandr,

> I agree with Aleksander about silencing all errors in 
> websearch_to_tsquery().
> 
> In the attachment is a revised patch with the attempt to introduce an
> ability to ignore syntax errors in gettoken_tsvector().

Thanks for the further improvements! Yes, you're both right, the API has 
to be consistent. Unfortunately, I had to make some adjustments 
according to Oleg Bartunov's review. Here's a change log:

1. &, | and (), <-> are no longer considered operators in web search 
mode.
2. I've stumbled upon a bug: web search used to transform "pg_class" 
into 'pg <-> class', which is no longer the case.
3. I changed the behavior of gettoken_tsvector() as soon as I had heard 
from Aleksander Alekseev, so I decided to use my implementation in this 
revision of the patch. This is a good subject for discussion, though. 
Feel free to share your opinion.
4. As suggested by Theodor, I've replaced some bool args with bit flags.

> The name of enum ts_parsestate looks more like a name of the function
> than a name of a type.
> In my version, it renamed to QueryParserState, but you can fix it if 
> I'm wrong.

True, but gettoken_query() returns ts_tokentype, so I decided to use 
this naming scheme.

-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

websearch_to_tsquery_v4.diff

Re: new function for tsquery creartion

From

Dmitry Ivanov

Date:

02 April 2018, 19:41:51

I've fixed a bug and added some tests and documentation.

-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

websearch_to_tsquery_v5.diff

Re: new function for tsquery creartion

From

Dmitry Ivanov

Date:

03 April 2018, 13:47:14

Hi everyone,

The code in its current state looks messy and way too complicated; 
there're lots of interleaving code branches. Thus, I decided to split 
gettoken_query() into three independent tokenizers for phrase, web and 
original (to_tsquery()) syntaxes. Documentation is included. Any 
feedback is very welcome.

-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

websearch_to_tsquery_v6.diff

Re: new function for tsquery creartion

From

Dmitry Ivanov

Date:

03 April 2018, 14:28:37

> The code in its current state looks messy and way too complicated;
> there're lots of interleaving code branches. Thus, I decided to split
> gettoken_query() into three independent tokenizers for phrase, web and
> original (to_tsquery()) syntaxes. Documentation is included. Any
> feedback is very welcome.

I'm sorry, I totally forgot to fix a few more things, the patch is 
attached below.

-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

websearch_to_tsquery_v7.diff

Re: new function for tsquery creartion

From

Aleksandr Parfenov

Date:

03 April 2018, 17:13:20

On Tue, 03 Apr 2018 14:28:37 +0300
Dmitry Ivanov <d.ivanov@postgrespro.ru> wrote:
> I'm sorry, I totally forgot to fix a few more things, the patch is 
> attached below.

The patch looks good to me except two things.

I'm not sure about the different result for these queries:
SELECT websearch_to_tsquery('simple', 'cat or ');
 websearch_to_tsquery 
----------------------
 'cat'
(1 row)
SELECT websearch_to_tsquery('simple', 'cat or');
 websearch_to_tsquery 
----------------------
 'cat' & 'or'
(1 row)

But I don't have strong opinion about these queries, since input in
both of them looks broken in terms of operator usage.

I found an odd behavior of the query creation function in case:
SELECT websearch_to_tsquery('english', '"pg_class pg"');
    websearch_to_tsquery     
-----------------------------
 ( 'pg' & 'class' ) <-> 'pg'
(1 row)

This query means that lexemes 'pg' and 'class' should be at the same
distance from the last 'pg'. In other words, they should have the same
position. But default parser will interpret pg_class as two separate
words during text parsing/vector creation.

The bug wasn't introduced in the patch and can be found in current
master. During the discussion of the patch with Dmitry, he noticed that
to_tsquery() function shares same odd behavior:
select to_tsquery('english', ' pg_class <-> pg');
         to_tsquery          
-----------------------------
 ( 'pg' & 'class' ) <-> 'pg'
(1 row)

This oddity caused by they implementation of makepol. In makepol, each
token (parsed by query parser) is sent to FTS parser and in case of
further separation of the token, it uses operator selected in functions
to_tsquery and friends. So it doesn't change over the runtime.

I see two different ways to solve it:
1) Use the same operator inside the parenthesizes. This will mean to
parse it as few parts of one word.
2) Remove parenthesizes. This will mean to parse it as few separate
words.

I prefer the second way since in some languages words can be separated
by some special symbol or not separated by any symbols at all and
should be extracted by special FTS parser. It also allows us to parse
such words as one by using the special parser (as it done for hyphenated
word).

But in the example with websearch_to_tsquery, I think it should use
the same operator for quoted part of the query. For example, we can
update the operator in makepol before sending it to pushval
(pushval_morph) to do so.

It looks like there should be two separated patches, one for
websearch_to_tsquery and another one for fixing odd behavior of the
query construction. But since the first one may depend on the
bugfix, to solve case with quotes, I will mark it as Waiting on
Author.

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: new function for tsquery creartion

From

Dmitry Ivanov

Date:

04 April 2018, 17:33:47

> I'm not sure about the different result for these queries:
> SELECT websearch_to_tsquery('simple', 'cat or ');
>  websearch_to_tsquery
> ----------------------
>  'cat'
> (1 row)
> SELECT websearch_to_tsquery('simple', 'cat or');
>  websearch_to_tsquery
> ----------------------
>  'cat' & 'or'
> (1 row)

I guess both queries should produce just 'cat'. I've changed the 
definition of parse_or_operator().

> I found an odd behavior of the query creation function in case:
> SELECT websearch_to_tsquery('english', '"pg_class pg"');
>     websearch_to_tsquery
> -----------------------------
>  ( 'pg' & 'class' ) <-> 'pg'
> (1 row)
> 
> This query means that lexemes 'pg' and 'class' should be at the same
> distance from the last 'pg'. In other words, they should have the same
> position. But default parser will interpret pg_class as two separate
> words during text parsing/vector creation.
> 
> The bug wasn't introduced in the patch and can be found in current
> master. During the discussion of the patch with Dmitry, he noticed that
> to_tsquery() function shares same odd behavior:
> select to_tsquery('english', ' pg_class <-> pg');
>          to_tsquery
> -----------------------------
>  ( 'pg' & 'class' ) <-> 'pg'
> (1 row)

I've been thinking about this for a while, and it seems that this should 
be fixed somewhere near parsetext(). Perhaps 'pg' and 'class' should 
share the same position. After all, somebody could implement a parser 
which would split some words using an arbitrary set of rules, for 
instance "split all words containing digits". I propose merging this 
patch provided that there are no objections.

-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

websearch_to_tsquery_v8.patch

Re: new function for tsquery creartion

From

Aleksandr Parfenov

Date:

04 April 2018, 17:49:08

On 2018-04-04 17:33, Dmitry Ivanov wrote:
> I've been thinking about this for a while, and it seems that this
> should be fixed somewhere near parsetext(). Perhaps 'pg' and 'class'
> should share the same position. After all, somebody could implement a
> parser which would split some words using an arbitrary set of rules,
> for instance "split all words containing digits". I propose merging
> this patch provided that there are no objections.

I'm agree that this problem should be solved in separate patch and
that this feature can be merged without waiting for the fix.

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: new function for tsquery creartion

From

Teodor Sigaev

Date:

05 April 2018, 19:56:07

Thanks to everyone, pushed with some editorization:

1) translate russian test to prevent potential problems with encoding
2) fix inconsistency 'or cat' and 'cat or', second example doesn't treat OR as 
lexeme, but first one does.




-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: new function for tsquery creartion

From

Where is Where

Date:

21 July 2019, 23:28:50

Hello everyone, I am wondering if

AROUND(N) or <N, M> is still possible? I found this thread below and the original post https://www.postgresql.org/message-id/fe931111ff7e9ad79196486ada79e268%40postgrespro.ru

mentioned the proposed feature: 'New operator AROUND(N). It matches if the distance between words(or maybe phrases) is less than or equal to N.'

currently in tsquery_phrase(query1 tsquery, query2 tsquery, distance integer) the distaince is searching a fixed distance, is there way to
search maximum distance so the search returns query1 followed by query2 up
to a certain distance? like the AROUND(N) or <N, M> mentioned in the thread?

Thank you!

On Mon, Jul 22, 2019 at 9:13 AM Dmitry Ivanov <d.ivanov@postgrespro.ru> wrote:

Hi everyone,

I'd like to share some intermediate results. Here's what has changed:

1. OR operator is now case-insensitive. Moreover, trailing whitespace is
no longer used to identify it:

select websearch_to_tsquery('simple', 'abc or');
websearch_to_tsquery
----------------------
'abc' & 'or'
(1 row)

select websearch_to_tsquery('simple', 'abc or(def)');
websearch_to_tsquery
----------------------
'abc' | 'def'
(1 row)

select websearch_to_tsquery('simple', 'abc or!def');
websearch_to_tsquery
----------------------
'abc' | 'def'
(1 row)

2. AROUND(N) has been dropped. I hope that <N, M> operator will allow us
to implement it with a few lines of code.

3. websearch_to_tsquery() now tolerates various syntax errors, for
instance:

Misused operators:

'abc &'
'| abc'
'<- def'

Missing parentheses:

'abc & (def <-> (cat or rat'

Other sorts of nonsense:

'abc &--|| def' => 'abc' & !!'def'
'abc:def' => 'abc':D & 'ef'

This, however, doesn't mean that the result will always be adequate (who
would have thought?). Overall, current implementation follows the GIGO
principle. In theory, this would allow us to use user-supplied websearch
strings (but see gotchas), even if they don't make much sense. Better
then nothing, right?

4. A small refactoring: I've replaced all WAIT* macros with a enum for
better debugging (names look much nicer in GDB). Hope this is
acceptable.

5. Finally, I've added a few more comments and tests. I haven't checked
the code coverage, though.

A few gotchas:

I haven't touched gettoken_tsvector() yet. As a result, the following
queries produce errors:

select websearch_to_tsquery('simple', '''');
ERROR: syntax error in tsquery: "'"

select websearch_to_tsquery('simple', '\');
ERROR: there is no escaped character: "\"

Maybe there's more. The question is: should we fix those, or it's fine
as it is? I don't have a strong opinion about this.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

query1 followed by query2 at maximum distance vs current fixed distance

From

Wh isere

Date:

23 July 2019, 06:55:44

Is this possible with the current websearch_to_tsquery function?

Thanks.

Hello everyone, I am wondering if
AROUND(N) or <N, M> is still possible? I found this thread below and the original post https://www.postgresql.org/message-id/fe931111ff7e9ad79196486ada79e268%40postgrespro.ru
mentioned the proposed feature: 'New operator AROUND(N). It matches if the distance between words(or maybe phrases) is less than or equal to N.'

currently in tsquery_phrase(query1 tsquery, query2 tsquery, distance integer) the distaince is searching a fixed distance, is there way to
search maximum distance so the search returns query1 followed by query2 up
to a certain distance? like the AROUND(N) or <N, M> mentioned in the thread?

Thank you!

On Mon, Jul 22, 2019 at 9:13 AM Dmitry Ivanov <d.ivanov@postgrespro.ru> wrote:
Hi everyone,

I'd like to share some intermediate results. Here's what has changed:

1. OR operator is now case-insensitive. Moreover, trailing whitespace is
no longer used to identify it:

select websearch_to_tsquery('simple', 'abc or');
websearch_to_tsquery
----------------------
'abc' & 'or'
(1 row)

select websearch_to_tsquery('simple', 'abc or(def)');
websearch_to_tsquery
----------------------
'abc' | 'def'
(1 row)

select websearch_to_tsquery('simple', 'abc or!def');
websearch_to_tsquery
----------------------
'abc' | 'def'
(1 row)

2. AROUND(N) has been dropped. I hope that <N, M> operator will allow us
to implement it with a few lines of code.

3. websearch_to_tsquery() now tolerates various syntax errors, for
instance:

Misused operators:

'abc &'
'| abc'
'<- def'

Missing parentheses:

'abc & (def <-> (cat or rat'

Other sorts of nonsense:

'abc &--|| def' => 'abc' & !!'def'
'abc:def' => 'abc':D & 'ef'

This, however, doesn't mean that the result will always be adequate (who
would have thought?). Overall, current implementation follows the GIGO
principle. In theory, this would allow us to use user-supplied websearch
strings (but see gotchas), even if they don't make much sense. Better
then nothing, right?

4. A small refactoring: I've replaced all WAIT* macros with a enum for
better debugging (names look much nicer in GDB). Hope this is
acceptable.

5. Finally, I've added a few more comments and tests. I haven't checked
the code coverage, though.

A few gotchas:

I haven't touched gettoken_tsvector() yet. As a result, the following
queries produce errors:

select websearch_to_tsquery('simple', '''');
ERROR: syntax error in tsquery: "'"

select websearch_to_tsquery('simple', '\');
ERROR: there is no escaped character: "\"

Maybe there's more. The question is: should we fix those, or it's fine
as it is? I don't have a strong opinion about this.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: query1 followed by query2 at maximum distance vs current fixeddistance

From

Arthur Zakirov

Date:

29 July 2019, 14:01:15

Hello,

On 23.07.2019 09:55, Wh isere wrote:
> Is this possible with the current websearch_to_tsquery function?
> 
> Thanks.
> 
>     Hello everyone, I am wondering if
>     AROUND(N) or <N, M> is still possible? I found this thread below and
>     the original post
>     https://www.postgresql.org/message-id/fe931111ff7e9ad79196486ada79e268%40postgrespro.ru
>     mentioned the proposed feature: 'New operator AROUND(N). It matches
>     if the distance between words(or maybe phrases) is less than or
>     equal to N.'
> 
>     currently in tsquery_phrase(query1 tsquery, query2 tsquery, distance
>     integer) the distaince is searching a fixed distance, is there way to
>     search maximum distance so the search returns query1 followed by
>     query2 up
>     to a certain distance? like the AROUND(N) or <N, M> mentioned in the
>     thread?
As far as I know AROUND(N) and <N, M> weren't committed, unfortunately. 
And so you can search only using a fixed distance currently.

websearch_to_tsquery() can't help here. It just transforms search 
pattern with OR, AND statements into tsquery syntax.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: query1 followed by query2 at maximum distance vs current fixed distance

From

Wh isere

Date:

30 July 2019, 10:22:50

Thanks Arthur! I guess there is not other solution? I tried to create a function to loop through all the distance but its very slow.

On Tuesday, July 30, 2019, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:

Hello,

On 23.07.2019 09:55, Wh isere wrote:
Is this possible with the current websearch_to_tsquery function?

Thanks.

Hello everyone, I am wondering if
AROUND(N) or <N, M> is still possible? I found this thread below and
the original post
https://www.postgresql.org/message-id/fe931111ff7e9ad79196486ada79e268%40postgrespro.ru
mentioned the proposed feature: 'New operator AROUND(N). It matches
if the distance between words(or maybe phrases) is less than or
equal to N.'

currently in tsquery_phrase(query1 tsquery, query2 tsquery, distance
integer) the distaince is searching a fixed distance, is there way to
search maximum distance so the search returns query1 followed by
query2 up
to a certain distance? like the AROUND(N) or <N, M> mentioned in the
thread?
As far as I know AROUND(N) and <N, M> weren't committed, unfortunately. And so you can search only using a fixed distance currently.

websearch_to_tsquery() can't help here. It just transforms search pattern with OR, AND statements into tsquery syntax.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company