Thread: [PATCH] Phrase search ported to 9.6

[PATCH] Phrase search ported to 9.6

From
Dmitry Ivanov
Date:
Hi Hackers,

Although PostgreSQL is capable of performing some FTS (full text search)
queries, there's still a room for improvement. Phrase search support could
become a great addition to the existing set of features.


Introduction
============

It is no secret that one can make Google search for an exact phrase (instead
of an unordered lexeme set) simply by enclosing it within double quotes. This
is a really nice feature which helps to save the time that would otherwise be
spent on annoying result filtering.

One weak spot of the current FTS implementation is that there is no way to
specify the desired lexeme order (since it would not make any difference at
all). In the end, the search engine will look for each lexeme individually,
which means that a hypothetical end user would have to discard documents not
including search phrase all by himself. This problem is solved by the patch
below (should apply cleanly to 61ce1e8f1).


Problem description
===================

The problem comes from the lack of lexeme ordering operator. Consider the
following example:

select q @@ plainto_tsquery('fatal error') from
unnest(array[to_tsvector('fatal error'), to_tsvector('error is not fatal')])
as q;
?column?
----------
t
t
(2 rows)

Clearly the latter match is not the best result in case we wanted to find
exactly the "fatal error" phrase. That's when the need for a lexeme ordering
operator arises:

select q @@ to_tsquery('fatal ? error') from unnest(array[to_tsvector('fatal
error'), to_tsvector('error is not fatal')]) as q;
?column?
----------
t
f
(2 rows)


Implementation
==============

The ? (FOLLOWED BY) binary operator takes form of "?" or "?[N]" where 0 <= N <
~16K. If N is provided, the distance between left and right operands must be
no greater that N. For example:

select to_tsvector('postgres has taken severe damage') @@ to_tsquery('postgres
? (severe ? damage)');
 ?column?
----------
 f
(1 row)

select to_tsvector('postgres has taken severe damage') @@ to_tsquery('postgres
?[4] (severe ? damage)');
 ?column?
----------
 t
(1 row)

New function phraseto_tsquery([ regconfig, ] text) takes advantage of the "?
[N]" operator in order to facilitate phrase search:

select to_tsvector('postgres has taken severe damage') @@
phraseto_tsquery('severely damaged');
 ?column?
----------
 t
(1 row)


This patch was originally developed by Teodor Sigaev and Oleg Bartunov in
2009, so all credit goes to them. Any feedback is welcome.


--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
Attachment

Re: [PATCH] Phrase search ported to 9.6

From
Michael Paquier
Date:
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Mon, Feb 1, 2016 at 8:21 PM, Dmitry
Ivanov<span dir="ltr"><<a href="mailto:d.ivanov@postgrespro.ru"
target="_blank">d.ivanov@postgrespro.ru</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0px 0px
0px0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> This patch was originally developed by Teodor Sigaev
andOleg Bartunov in<br /> 2009, so all credit goes to them. Any feedback is welcome.<span class=""></span><br
/></blockquote></div><br/></div><div class="gmail_extra">This is not a small patch:<br />28 files changed, 2441
insertions(+),380 deletions(-)<br /></div><div class="gmail_extra">And the last CF of 9.6 should not contain rather
largepatches.<br /></div><div class="gmail_extra">-- <br /><div class="gmail_signature">Michael<br /></div></div></div> 

Re: [PATCH] Phrase search ported to 9.6

From
Andreas Joseph Krogh
Date:
På tirsdag 02. februar 2016 kl. 04:22:57, skrev Michael Paquier <michael.paquier@gmail.com>:
 
 
On Mon, Feb 1, 2016 at 8:21 PM, Dmitry Ivanov <d.ivanov@postgrespro.ru> wrote:
This patch was originally developed by Teodor Sigaev and Oleg Bartunov in
2009, so all credit goes to them. Any feedback is welcome.
This is not a small patch:
28 files changed, 2441 insertions(+), 380 deletions(-)
And the last CF of 9.6 should not contain rather large patches.
--
Michael
 
 
OTOH; It would be extremely nice to get this into 9.6.
 
 
--
Andreas Joseph Krogh

Re: [PATCH] Phrase search ported to 9.6

From
Oleg Bartunov
Date:


On Tue, Feb 2, 2016 at 10:18 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På tirsdag 02. februar 2016 kl. 04:22:57, skrev Michael Paquier <michael.paquier@gmail.com>:
 
 
On Mon, Feb 1, 2016 at 8:21 PM, Dmitry Ivanov <d.ivanov@postgrespro.ru> wrote:
This patch was originally developed by Teodor Sigaev and Oleg Bartunov in
2009, so all credit goes to them. Any feedback is welcome.
This is not a small patch:
28 files changed, 2441 insertions(+), 380 deletions(-)
And the last CF of 9.6 should not contain rather large patches.
--
Michael
 
 
OTOH; It would be extremely nice to get this into 9.6.

will see how community decided.
anyway, it's already in our distribution.

 
 
 
--
Andreas Joseph Krogh

Re: [PATCH] Phrase search ported to 9.6

From
Andreas Joseph Krogh
Date:
På tirsdag 02. februar 2016 kl. 09:20:06, skrev Oleg Bartunov <obartunov@gmail.com>:
 
 
On Tue, Feb 2, 2016 at 10:18 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På tirsdag 02. februar 2016 kl. 04:22:57, skrev Michael Paquier <michael.paquier@gmail.com>:
 
 
On Mon, Feb 1, 2016 at 8:21 PM, Dmitry Ivanov <d.ivanov@postgrespro.ru> wrote:
This patch was originally developed by Teodor Sigaev and Oleg Bartunov in
2009, so all credit goes to them. Any feedback is welcome.
This is not a small patch:
28 files changed, 2441 insertions(+), 380 deletions(-)
And the last CF of 9.6 should not contain rather large patches.
--
Michael
 
 
OTOH; It would be extremely nice to get this into 9.6.
 
will see how community decided.
anyway, it's already in our distribution.
 
 
Which seems to indicate it has received a fair amount of testing and is quite stable.
Hopefully it integrates into the 9.6 codebase without too much risk.
Thanks for contributing this.
 
--
Andreas Joseph Krogh

Re: [PATCH] Phrase search ported to 9.6

From
Alvaro Herrera
Date:
Andreas Joseph Krogh wrote: 
> Which seems to indicate it has received a fair amount of testing and is quite 
> stable.
> Hopefully it integrates into the 9.6 codebase without too much risk.

Yes, yes, that's all very good, but we're nearing the closure of the 9.6
development cycle and we only have one commitfest left.  If someone had
lots of community brownie points because of doing lots of reviews of
other people's patches, they might push their luck by posting this patch
to the final commitfest.  But if that someone didn't, then it wouldn't
be fair, and if I were the commitfest manager of that commitfest I would
boot their patch to the 9.7-First commitfest.

The current commitfest which I'm trying to close still has 24 patches in
needs-review state and 11 patches ready-for-committer; the next one (not
closed yet) has 40 patches that will need review.  That means a total of
75 patches, and those should all be processed ahead of this one.  The
effort needed to process each of those patches is not trivial, and I'm
sorry I have to say this but I don't see PostgresPro contributing enough
reviews, even though I pinged a number of people there, so putting one
more patch on the rest of the community's shoulders doesn't seem fair to
me.

Everybody has their favorite patch that they want in the next release,
but we only have so much manpower to review and integrate those patches.
All review help is welcome.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Phrase search ported to 9.6

From
Andreas Joseph Krogh
Date:
På tirsdag 02. februar 2016 kl. 12:04:21, skrev Alvaro Herrera <alvherre@2ndquadrant.com>:
Andreas Joseph Krogh wrote:
  
> Which seems to indicate it has received a fair amount of testing and is quite
> stable.
> Hopefully it integrates into the 9.6 codebase without too much risk.

Yes, yes, that's all very good, but we're nearing the closure of the 9.6
development cycle and we only have one commitfest left.  If someone had
lots of community brownie points because of doing lots of reviews of
other people's patches, they might push their luck by posting this patch
to the final commitfest.  But if that someone didn't, then it wouldn't
be fair, and if I were the commitfest manager of that commitfest I would
boot their patch to the 9.7-First commitfest.

The current commitfest which I'm trying to close still has 24 patches in
needs-review state and 11 patches ready-for-committer; the next one (not
closed yet) has 40 patches that will need review.  That means a total of
75 patches, and those should all be processed ahead of this one.  The
effort needed to process each of those patches is not trivial, and I'm
sorry I have to say this but I don't see PostgresPro contributing enough
reviews, even though I pinged a number of people there, so putting one
more patch on the rest of the community's shoulders doesn't seem fair to
me.

Everybody has their favorite patch that they want in the next release,
but we only have so much manpower to review and integrate those patches.
All review help is welcome.
 
I understand completely.
 
--
Andreas Joseph Krogh

Re: [PATCH] Phrase search ported to 9.6

From
Oleg Bartunov
Date:


On Tue, Feb 2, 2016 at 2:04 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Andreas Joseph Krogh wrote:
  
> Which seems to indicate it has received a fair amount of testing and is quite
> stable.
> Hopefully it integrates into the 9.6 codebase without too much risk.

Yes, yes, that's all very good, but we're nearing the closure of the 9.6
development cycle and we only have one commitfest left.  If someone had
lots of community brownie points because of doing lots of reviews of
other people's patches, they might push their luck by posting this patch
to the final commitfest.  But if that someone didn't, then it wouldn't
be fair, and if I were the commitfest manager of that commitfest I would
boot their patch to the 9.7-First commitfest.

The current commitfest which I'm trying to close still has 24 patches in
needs-review state and 11 patches ready-for-committer; the next one (not
closed yet) has 40 patches that will need review.  That means a total of
75 patches, and those should all be processed ahead of this one.  The
effort needed to process each of those patches is not trivial, and I'm
sorry I have to say this but I don't see PostgresPro contributing enough
reviews, even though I pinged a number of people there, so putting one
more patch on the rest of the community's shoulders doesn't seem fair to
me.

I'll talk about this.
 

Everybody has their favorite patch that they want in the next release,
but we only have so much manpower to review and integrate those patches.
All review help is welcome.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] Phrase search ported to 9.6

From
Robert Haas
Date:
On Tue, Feb 2, 2016 at 6:04 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Andreas Joseph Krogh wrote:
>> Which seems to indicate it has received a fair amount of testing and is quite
>> stable.
>> Hopefully it integrates into the 9.6 codebase without too much risk.
>
> Yes, yes, that's all very good, but we're nearing the closure of the 9.6
> development cycle and we only have one commitfest left.  If someone had
> lots of community brownie points because of doing lots of reviews of
> other people's patches, they might push their luck by posting this patch
> to the final commitfest.  But if that someone didn't, then it wouldn't
> be fair, and if I were the commitfest manager of that commitfest I would
> boot their patch to the 9.7-First commitfest.

Completely agreed.  Also, to be frank, these text search patches are
often in need of quite a LOT of work per line compared to some others.
For example, consider the percentage of this patch that is comments.
It's a pretty low percentage.  And it's going into surrounding code
that is also very low in comments.  Giving this a good review will
take somebody a lot of time, and bringing it up to PostgreSQL's normal
standards will probably take quite a lot of work.  I don't understand
why the community should agree to do that for anyone, and as you say,
it's not like PostgresPro is leading the pack in terms of review
contributions.  We're not going to get this release out on time if
everybody insists that every patch has to go in.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Phrase search ported to 9.6

From
Artur Zakirov
Date:
Hello, Dmitry

This is my initial review for you patch. Below are my comments.

Introduction
------------

This patch introduce new operator and new functions.
New operator:
- ??
New functions:
- phraseto_tsquery([ config regconfig , ] query text)
- setweight(tsquery, "char")
- tsquery_phrase(query1 tsquery, query2 tsquery)
- tsquery_phrase(query1 tsquery, query2 tsquery, distance integer)

New regression tests are included in the patch. Information about new 
features is provided in the documentation.

Initial Run
-----------

The patch applies correctly to HEAD. Regression tests pass successfully, 
without errors. It seems that the patch work as you expected.

Performance
-----------

I have not tested possible performance issues yet. Maybe later I will 
prepare some data to test performance.

Coding
------

I agree with others that there is a lack of comments for new functions. 
Most of them without comments.

> ../pg_patches/phrase_search.patch:1086: trailing whitespace.
>      * QI_VALSTOP nodes should be cleaned and
> ../pg_patches/phrase_search.patch:1124: trailing whitespace.
>         if (priority < parentPriority)
> ../pg_patches/phrase_search.patch:1140: trailing whitespace.
>         if (priority < parentPriority)
> ../pg_patches/phrase_search.patch:1591: space before tab in indent.
>                      /*    (a|b) ? c => (a?c) | (b?c) */
> ../pg_patches/phrase_search.patch:1603: space before tab in indent.
>                      /*    !a ? b    => b & !(a?b) */

It is git apply output. There are trailing whitespaces in the code.

> +typedef struct MorphOpaque
> +{
> +    Oid        cfg_id;
> +    int        qoperator;    /* query operator */
> +} MorphOpaque;

Maybe you should move structure definition to the top of the to_tsany.c

> +                    pushValue(state,
> +                              prs.words[count].word,
> +                              prs.words[count].len,
> +                              weight,
> +                              ((prs.words[count].flags & TSL_PREFIX) || prefix) ?
> +                                    true :
> +                                    false);

There is not need in ternary operator here. You can write:

pushValue(state,      prs.words[count].word,      prs.words[count].len,      weight,      (prs.words[count].flags &
TSL_PREFIX)|| prefix);
 

>  /*
> + * Wrapper of check condition function for TS_execute.
> + * We are using the fact GIN_FALSE = 0 and GIN_MAYBE state
> + * should be treated as true, so, any non-zero value should be
> + * converted to boolean true.
> + */
> +static bool
> +checkcondition_gin(void *checkval, QueryOperand *val, ExecPhraseData *data)
> +{
> +    return !!checkcondition_gin_internal((GinChkVal *) checkval, val, data);
> +}

Maybe write like this?

static bool
checkcondition_gin(void *checkval, QueryOperand *val, ExecPhraseData *data)
{return checkcondition_gin_internal((GinChkVal *) checkval, val, data) 
!= GIN_FALSE;
}

Then you don't need in the comment above of the function.

> +#define PRINT_PRIORITY(x)    ( (((QueryOperator*)(x))->oper == OP_NOT) ? 5 : QO_PRIORITY(x) )

What is mean "5" here? You can define a new value near the:

#define OP_OR        1
#define OP_AND        2
#define OP_NOT        3
#define OP_PHRASE    4

> +Datum
> +tsquery_setweight(PG_FUNCTION_ARGS)
> +{
> +    TSQuery        in = PG_GETARG_TSQUERY(0);
> +    char        cw = PG_GETARG_CHAR(1);
> +    TSQuery     out;
> +    QueryItem  *item;
> +    int         w = 0;
> +
> +    switch (cw)
> +    {
> +        case 'A':
> +        case 'a':
> +            w = 3;
> +            break;
> +        case 'B':
> +        case 'b':
> +            w = 2;
> +            break;
> +        case 'C':
> +        case 'c':
> +            w = 1;
> +            break;
> +        case 'D':
> +        case 'd':
> +            w = 0;
> +            break;
> +        default:
> +            /* internal error */
> +            elog(ERROR, "unrecognized weight: %d", cw);
> +    }
> +
> +    out = (TSQuery) palloc(VARSIZE(in));
> +    memcpy(out, in, VARSIZE(in));
> +    item = GETQUERY(out);
> +
> +    while(item - GETQUERY(out) < out->size)    
> +    {
> +        if (item->type == QI_VAL)
> +            item->qoperand.weight |= (1 << w);
> +
> +        item++;
> +    }
> +
> +    PG_FREE_IF_COPY(in, 0);
> +    PG_RETURN_POINTER(out);
> +}

This function has the duplicated piece from the tsvector_setweight() 
from tsvector_op.c. You can define new function for it.

> +/*
> + * Check if datatype is the specified type or equivalent to it.
> + *
> + * Note: we could just do getBaseType() unconditionally, but since that's
> + * a relatively expensive catalog lookup that most users won't need, we
> + * try the straight comparison first.
> + */
> +static bool
> +is_expected_type(Oid typid, Oid expected_type)
> +{
> +    if (typid == expected_type)
> +        return true;
> +    typid = getBaseType(typid);
> +    if (typid == expected_type)
> +        return true;
> +    return false;
> +}
> +
> +/* Check if datatype is TEXT or binary-equivalent to it */
> +static bool
> +is_text_type(Oid typid)
> +{
> +    /* varchar(n) and char(n) are binary-compatible with text */
> +    if (typid == TEXTOID || typid == VARCHAROID || typid == BPCHAROID)
> +        return true;
> +    /* Allow domains over these types, too */
> +    typid = getBaseType(typid);
> +    if (typid == TEXTOID || typid == VARCHAROID || typid == BPCHAROID)
> +        return true;
> +    return false;
> +}

These functions was removed in 9acb9007de30b3daaa9efc16763c3bc6e3e0a92d. 
It seems that tsvector_op.c was not synchronized with the master.

Conclusion
----------

This patch is large and it needs more research. I will be reviewing it 
and will give another notes later.

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [PATCH] Phrase search ported to 9.6

From
David Steele
Date:
On 2/29/16 10:33 AM, Artur Zakirov wrote:

> Conclusion
> ----------
> 
> This patch is large and it needs more research. I will be reviewing it
> and will give another notes later.

This thread has not had a response from the authors or new patch in more
than two weeks.  Please post something soon or it will be marked
"returned with feedback".

-- 
-David
david@pgmasters.net



Re: [PATCH] Phrase search ported to 9.6

From
Dmitry Ivanov
Date:
Hi, Artur

I've made an attempt to fix some of the issues you've listed, although there's
still much work to be done. I'll add some comments later.

> This function has the duplicated piece from the tsvector_setweight()
> from tsvector_op.c. You can define new function for it.

I'm not sure it's worth the trouble. IMO these functions are relatively small
and we won't benefit from extracting the duplicate code.

> These functions was removed in 9acb9007de30b3daaa9efc16763c3bc6e3e0a92d.
> It seems that tsvector_op.c was not synchronized with the master.

Got it, thanks!

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment

Re: [PATCH] Phrase search ported to 9.6

From
Artur Zakirov
Date:
I tried to find some bugs in the code. I can't find them. But it does 
not mean that there are not bugs.

Still there are a lack of comments and trailing whitespaces.

On 16.03.2016 19:38, Dmitry Ivanov wrote:
> Hi, Artur
>
> I've made an attempt to fix some of the issues you've listed, although there's
> still much work to be done. I'll add some comments later.
>
>> This function has the duplicated piece from the tsvector_setweight()
>> from tsvector_op.c. You can define new function for it.
>
> I'm not sure it's worth the trouble. IMO these functions are relatively small
> and we won't benefit from extracting the duplicate code.

I think that a separate function would be better. These weights are 
occurred in other functions. But I might be wrong and maybe I cavil at 
the code too much.

>
>> These functions was removed in 9acb9007de30b3daaa9efc16763c3bc6e3e0a92d.
>> It seems that tsvector_op.c was not synchronized with the master.
>
> Got it, thanks!
>


-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [PATCH] Phrase search ported to 9.6

From
Alexander Korotkov
Date:
Hi!

I see that patch changes existing regression tests in tsearch2.out.

*** a/contrib/tsearch2/expected/tsearch2.out
--- b/contrib/tsearch2/expected/tsearch2.out
*************** SELECT '(!1|2)&3'::tsquery;
*** 278,292 ****
  (1 row)

  SELECT '1|(2|(4|(5|6)))'::tsquery;
!                  tsquery
! -----------------------------------------
!  '1' | ( '2' | ( '4' | ( '5' | '6' ) ) )
  (1 row)

  SELECT '1|2|4|5|6'::tsquery;
!                  tsquery
! -----------------------------------------
!  ( ( ( '1' | '2' ) | '4' ) | '5' ) | '6'
  (1 row)

  SELECT '1&(2&(4&(5&6)))'::tsquery;
--- 278,292 ----
  (1 row)

  SELECT '1|(2|(4|(5|6)))'::tsquery;
!            tsquery
! -----------------------------
!  '1' | '2' | '4' | '5' | '6'
  (1 row)

  SELECT '1|2|4|5|6'::tsquery;
!            tsquery
! -----------------------------
!  '1' | '2' | '4' | '5' | '6'
  (1 row)


This change looks like improvement, without braces tsquery readability is much better.

*************** select rewrite('moscow & hotel', 'select
*** 461,469 ****
  (1 row)

  select rewrite('bar &  new & qq & foo & york', 'select keyword, sample from test_tsquery'::text );
!                                        rewrite
! -------------------------------------------------------------------------------------
!  'citi' & 'foo' & ( 'bar' | 'qq' ) & ( 'nyc' | ( 'big' & 'appl' | 'new' & 'york' ) )
  (1 row)

  select rewrite( ARRAY['moscow', keyword, sample] ) from test_tsquery;
--- 461,469 ----
  (1 row)

  select rewrite('bar &  new & qq & foo & york', 'select keyword, sample from test_tsquery'::text );
!                                      rewrite
! ---------------------------------------------------------------------------------
!  ( 'nyc' | 'big' & 'appl' | 'new' & 'york' ) & 'citi' & 'foo' & ( 'bar' | 'qq' )
  (1 row)

  select rewrite( ARRAY['moscow', keyword, sample] ) from test_tsquery;

However, such reorderings look unclear and need motivation.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [PATCH] Phrase search ported to 9.6

From
David Steele
Date:
Hi Dmitry,

On 3/16/16 12:38 PM, Dmitry Ivanov wrote:

> I've made an attempt to fix some of the issues you've listed, although there's
> still much work to be done. I'll add some comments later.

Do you know when you'll have a chance to respond to reviews and provide 
a new patch?

Time is short and it's not encouraging that you say there is "still much 
work to be done".  Perhaps it would be best to mark this "returned with 
feedback"?

Thanks,
-- 
-David
david@pgmasters.net



Re: [PATCH] Phrase search ported to 9.6

From
Alexander Korotkov
Date:
On Fri, Mar 25, 2016 at 6:42 PM, David Steele <david@pgmasters.net> wrote:
On 3/16/16 12:38 PM, Dmitry Ivanov wrote:

I've made an attempt to fix some of the issues you've listed, although there's
still much work to be done. I'll add some comments later.

Do you know when you'll have a chance to respond to reviews and provide a new patch?

Time is short and it's not encouraging that you say there is "still much work to be done".  Perhaps it would be best to mark this "returned with feedback"?

Dmitry will provide a new version of patch today.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
 

Re: [PATCH] Phrase search ported to 9.6

From
Dmitry Ivanov
Date:
Sorry for the delay, I desperately needed some time to finish a bunch of
dangling tasks.

I've added some new comments and clarified the ones that were obscure.
Moreover, I felt an urge to recheck most parts of the code since apparently
nobody (besides myself) has gone so far yet.

On 25.03.16 18:42 MSK, David Steele wrote:
> Time is short and it's not encouraging that you say there is "still much
work to be done".

Indeed, I was inaccurate. I am more than interested in the positive outcome.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
Attachment

Re: [PATCH] Phrase search ported to 9.6

From
David Steele
Date:
On 3/25/16 2:42 PM, Dmitry Ivanov wrote:
> Sorry for the delay, I desperately needed some time to finish a bunch of 
> dangling tasks.
> 
> I've added some new comments and clarified the ones that were obscure. 
> Moreover, I felt an urge to recheck most parts of the code since apparently 
> nobody (besides myself) has gone so far yet.
> 
> On 25.03.16 18:42 MSK, David Steele wrote:
>> Time is short and it's not encouraging that you say there is "still much 
> work to be done".
> 
> Indeed, I was inaccurate. I am more than interested in the positive outcome.

Me, too!  I have set this to "needs review".

-- 
-David
david@pgmasters.net



Re: [PATCH] Phrase search ported to 9.6

From
Artur Zakirov
Date:
On 25.03.2016 21:42, Dmitry Ivanov wrote:
> Sorry for the delay, I desperately needed some time to finish a bunch of
> dangling tasks.
>
> I've added some new comments and clarified the ones that were obscure.
> Moreover, I felt an urge to recheck most parts of the code since apparently
> nobody (besides myself) has gone so far yet.
>
> On 25.03.16 18:42 MSK, David Steele wrote:
>> Time is short and it's not encouraging that you say there is "still much
> work to be done".
>
> Indeed, I was inaccurate. I am more than interested in the positive outcome.
>

Hi,

Thank you for your work!

I tested the patch and take a look on it. All regression tests passed. 
The code looks good and the patch introduce a great functionality.

I think the patch can be marked as "Ready for Commiter". But I do not 
feel the force to do that myself.

Also I agree with you about tsvector_setweight(). There is not a problem 
with it because this weights are immutable and so there is not benefits 
from new function.

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [PATCH] Phrase search ported to 9.6

From
David Steele
Date:
On 3/25/16 3:54 PM, Artur Zakirov wrote:
> On 25.03.2016 21:42, Dmitry Ivanov wrote:
>> Sorry for the delay, I desperately needed some time to finish a bunch of
>> dangling tasks.
>>
>> I've added some new comments and clarified the ones that were obscure.
>> Moreover, I felt an urge to recheck most parts of the code since
>> apparently
>> nobody (besides myself) has gone so far yet.
>>
>> On 25.03.16 18:42 MSK, David Steele wrote:
>>> Time is short and it's not encouraging that you say there is "still much
>> work to be done".
>>
>> Indeed, I was inaccurate. I am more than interested in the positive
>> outcome.
>>
> 
> I tested the patch and take a look on it. All regression tests passed.
> The code looks good and the patch introduce a great functionality.
> 
> I think the patch can be marked as "Ready for Commiter". But I do not
> feel the force to do that myself.
> 
> Also I agree with you about tsvector_setweight(). There is not a problem
> with it because this weights are immutable and so there is not benefits
> from new function.

Ideally Alexander can also look at it.  If not, then you should mark it
"ready for committer" since I doubt any more reviewers will sign on this
late in the CF.

-- 
-David
david@pgmasters.net



Re: [PATCH] Phrase search ported to 9.6

From
Alexander Korotkov
Date:
Hi!

On Sat, Mar 26, 2016 at 12:02 AM, David Steele <david@pgmasters.net> wrote:
On 3/25/16 3:54 PM, Artur Zakirov wrote:
> On 25.03.2016 21:42, Dmitry Ivanov wrote:
>> Sorry for the delay, I desperately needed some time to finish a bunch of
>> dangling tasks.
>>
>> I've added some new comments and clarified the ones that were obscure.
>> Moreover, I felt an urge to recheck most parts of the code since
>> apparently
>> nobody (besides myself) has gone so far yet.
>>
>> On 25.03.16 18:42 MSK, David Steele wrote:
>>> Time is short and it's not encouraging that you say there is "still much
>> work to be done".
>>
>> Indeed, I was inaccurate. I am more than interested in the positive
>> outcome.
>>
>
> I tested the patch and take a look on it. All regression tests passed.
> The code looks good and the patch introduce a great functionality.
>
> I think the patch can be marked as "Ready for Commiter". But I do not
> feel the force to do that myself.
>
> Also I agree with you about tsvector_setweight(). There is not a problem
> with it because this weights are immutable and so there is not benefits
> from new function.

Ideally Alexander can also look at it.  If not, then you should mark it
"ready for committer" since I doubt any more reviewers will sign on this
late in the CF.

I've checked the last version of the patch. Patch applies cleanly on master, builds without warnings, regression tests pass.
The issue I reported about tsquery textual representation is fixed.
I'm going to mark this "Ready for Committer".

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [PATCH] Phrase search ported to 9.6

From
Teodor Sigaev
Date:
Looking at patch, I'm inlined to commit it in current commitfest, it looks
workable and too many people desire it. I've did some changes, mostly in
formatting and comments.

Patch makes some user-visible changes:
1 change order for tsquery. Assume, it's acceptable, only a few users store
tsquery in table and have a Btree index. They will need to reindex.
2 less number of parenthesis in tsquery output, and tsquery becomes more readable.

Pls, remove tsquery_setweight from patch because it's not a part of phrase
search and it isn't mentioned in docs. Please, make a separate patch for it.


Dmitry Ivanov wrote:
> Sorry for the delay, I desperately needed some time to finish a bunch of
> dangling tasks.
>
> I've added some new comments and clarified the ones that were obscure.
> Moreover, I felt an urge to recheck most parts of the code since apparently
> nobody (besides myself) has gone so far yet.
>
> On 25.03.16 18:42 MSK, David Steele wrote:
>> Time is short and it's not encouraging that you say there is "still much
> work to be done".
>
> Indeed, I was inaccurate. I am more than interested in the positive outcome.
>
>
>
>

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Attachment

Re: [PATCH] Phrase search ported to 9.6

From
Dmitry Ivanov
Date:
Hi Teodor,

I've looked through your version and made a few adjustments.

> Pls, remove tsquery_setweight from patch because it's not a part of phrase
> search and it isn't mentioned in docs. Please, make a separate patch for it.

I've removed tsquery_setweight as you requested. I'm going to add it to the
following commitfest later.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
Attachment

Re: [PATCH] Phrase search ported to 9.6

From
Alvaro Herrera
Date:
What led you to choose the ? operator for the FOLLOWED BY semantics?
It doesn't seem a terribly natural choice -- most other things seems to
use ? as some sort of wildcard.  What about something like "...", so you
would do SELECT q @@ to_tsquery('fatal ... error');
and  SELECT q @@ (tsquery 'fatal' ... tsquery 'error');

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Phrase search ported to 9.6

From
Oleg Bartunov
Date:


On Thu, Mar 31, 2016 at 9:14 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
What led you to choose the ? operator for the FOLLOWED BY semantics?
It doesn't seem a terribly natural choice -- most other things seems to
use ? as some sort of wildcard.  What about something like "...", so you
would do
  SELECT q @@ to_tsquery('fatal ... error');
and
  SELECT q @@ (tsquery 'fatal' ... tsquery 'error');


originally was $, but then we change it to ?, we don't remember why. During warming-up this morning we came to other suggestion

SELECT q @@ to_tsquery('fatal <> error');
and
SELECT q @@ to_tsquery('fatal <2> error');

How about this ?

 
--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] Phrase search ported to 9.6

From
Alvaro Herrera
Date:
Oleg Bartunov wrote:
> On Thu, Mar 31, 2016 at 9:14 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
> 
> > What led you to choose the ? operator for the FOLLOWED BY semantics?
> > It doesn't seem a terribly natural choice -- most other things seems to
> > use ? as some sort of wildcard.  What about something like "...", so you
> > would do
> >   SELECT q @@ to_tsquery('fatal ... error');
> > and
> >   SELECT q @@ (tsquery 'fatal' ... tsquery 'error');
> >
> >
> originally was $, but then we change it to ?, we don't remember why. During
> warming-up this morning we came to other suggestion
> 
> SELECT q @@ to_tsquery('fatal <> error');
> and
> SELECT q @@ to_tsquery('fatal <2> error');
> 
> How about this ?

Well, I noticed that the docs talk about an operator that can be used in
SQL (outside the tsquery parser), as well as an operator that can be
used inside tsquery.  Inside tsquery anything would be usable, but
outside that it would be good to keep in mind the rest of SQL operators;
and <> means "different from", so using it for FOLLOWED BY seems odd to
me.

My suggestion of ... (ellipsis) is because that's already known as
related to text, used for omissions of words, where the surrounding
words form a phrase.  It seems much more natural to me.

(I also noticed that you can specify a counter together with the
operator, for "at most N words apart", which is not going to work for
the SQL-level operator no matter what you choose.  I suppose that's not
considered a problem)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Phrase search ported to 9.6

From
Teodor Sigaev
Date:
> Well, I noticed that the docs talk about an operator that can be used in
> SQL (outside the tsquery parser), as well as an operator that can be
Just to join 2 tsquery with operator FOLLOWED BY

> used inside tsquery.  Inside tsquery anything would be usable, but
> outside that it would be good to keep in mind the rest of SQL operators;
> and <> means "different from", so using it for FOLLOWED BY seems odd to
> me.
previous rule was: duplicate operation character to join tsqueries.
Ellipsis is forbidden to use as operator name:
#  create operator ... (procedure = 'int2and');
ERROR:  syntax error at or near ".."
LINE 1: create operator ... (procedure = 'int2and');
What is your suggestion for joining operator name?

> My suggestion of ... (ellipsis) is because that's already known as
> related to text, used for omissions of words, where the surrounding
> words form a phrase.  It seems much more natural to me.
Yes, agree for omission. But for reason above its'n a good name although I don't 
have a strong objections.

may be <=>? it isn't used anywhere yet.

select 'fat'::tsquery <=> 'cat';
select 'fat <=> cat'::tsquery;
select 'fat <3> cat'::tsqyery; -- for non-default distance.


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: [PATCH] Phrase search ported to 9.6

From
Alvaro Herrera
Date:
Teodor Sigaev wrote:
> >Well, I noticed that the docs talk about an operator that can be used in
> >SQL (outside the tsquery parser), as well as an operator that can be
> Just to join 2 tsquery with operator FOLLOWED BY

Ok.

> >used inside tsquery.  Inside tsquery anything would be usable, but
> >outside that it would be good to keep in mind the rest of SQL operators;
> >and <> means "different from", so using it for FOLLOWED BY seems odd to
> >me.
> previous rule was: duplicate operation character to join tsqueries.
> Ellipsis is forbidden to use as operator name:
> #  create operator ... (procedure = 'int2and');
> ERROR:  syntax error at or near ".."
> LINE 1: create operator ... (procedure = 'int2and');
> What is your suggestion for joining operator name?

Silly me ...  The dot is not allowed in operator names :-(  Sorry.  If
there was a character that was very similar to dots I would suggest
that.  The closest is * I think, so what do you think of "***"?

> >My suggestion of ... (ellipsis) is because that's already known as
> >related to text, used for omissions of words, where the surrounding
> >words form a phrase.  It seems much more natural to me.
> Yes, agree for omission. But for reason above its'n a good name although I
> don't have a strong objections.
> 
> may be <=>? it isn't used anywhere yet.
> 
> select 'fat'::tsquery <=> 'cat';
> select 'fat <=> cat'::tsquery;
> select 'fat <3> cat'::tsqyery; -- for non-default distance.

Dunno.  That looks pretty "relationalish".

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Phrase search ported to 9.6

From
Teodor Sigaev
Date:
> there was a character that was very similar to dots I would suggest
> that.  The closest is * I think, so what do you think of "***"?

And join opertator for tsqueries is the same :
select 'fat'::tsquery *** 'cat'; ?

Single '*' ?  That's close to regex, any number of tokens. And it saves rules 
about duplicating character.

select 'fat'::tsquery ** 'cat';
select 'fat * cat'::tsquery;
select 'fat * [3] cat'::tsqyery; -- for non-default distance.


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: [PATCH] Phrase search ported to 9.6

From
Andreas Joseph Krogh
Date:
På fredag 01. april 2016 kl. 15:22:55, skrev Teodor Sigaev <teodor@sigaev.ru>:
> there was a character that was very similar to dots I would suggest
> that.  The closest is * I think, so what do you think of "***"?

And join opertator for tsqueries is the same :
select 'fat'::tsquery *** 'cat'; ?

Single '*' ?  That's close to regex, any number of tokens. And it saves rules
about duplicating character.

select 'fat'::tsquery ** 'cat';
select 'fat * cat'::tsquery;
select 'fat * [3] cat'::tsqyery; -- for non-default distance.
 
What about ~> ?
 
--
Andreas Joseph Krogh

Re: [PATCH] Phrase search ported to 9.6

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Teodor Sigaev wrote:
>> may be <=>? it isn't used anywhere yet.
>> 
>> select 'fat'::tsquery <=> 'cat';
>> select 'fat <=> cat'::tsquery;
>> select 'fat <3> cat'::tsqyery; -- for non-default distance.

> Dunno.  That looks pretty "relationalish".

The existing precedent in the geometric types is to use <->
as an operator denoting distance.  Perhaps <-> amd <3>
would work here.
        regards, tom lane



Re: [PATCH] Phrase search ported to 9.6

From
Oleg Bartunov
Date:


On Fri, Apr 1, 2016 at 5:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Teodor Sigaev wrote:
>> may be <=>? it isn't used anywhere yet.
>>
>> select 'fat'::tsquery <=> 'cat';
>> select 'fat <=> cat'::tsquery;
>> select 'fat <3> cat'::tsqyery; -- for non-default distance.

> Dunno.  That looks pretty "relationalish".

The existing precedent in the geometric types is to use <->
as an operator denoting distance.  Perhaps <-> amd <3>
would work here.


looks like we get consensus about arrows.
 
                        regards, tom lane

Re: [PATCH] Phrase search ported to 9.6

From
Dmitry Ivanov
Date:
It seems that everything is settled now, so here's the patch introducing the
'<->' and '<N>' operators. I've made the necessary changes to docs &
regression tests.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
Attachment

Re: [PATCH] Phrase search ported to 9.6

From
Dmitry Ivanov
Date:
> It seems that everything is settled now, so here's the patch introducing the
> '<->' and '<N>' operators. I've made the necessary changes to docs &
> regression tests.

I noticed that I had accidently trimmed whitespaces in docs, this is a better
one.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
Attachment

Re: [PATCH] Phrase search ported to 9.6

From
Dmitry Ivanov
Date:
> > It seems that everything is settled now, so here's the patch introducing
> > the '<->' and '<N>' operators. I've made the necessary changes to docs &
> > regression tests.
> 
> I noticed that I had accidently trimmed whitespaces in docs, this is a
> better one.

After a brief but reasonable discussion with Teodor I've come up with a more 
proper piece of code for phrase operator parsing. The patch is included below.

-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [PATCH] Phrase search ported to 9.6

From
Dmitry Ivanov
Date:
> > > It seems that everything is settled now, so here's the patch introducing
> > > the '<->' and '<N>' operators. I've made the necessary changes to docs &
> > > regression tests.
> >
> > I noticed that I had accidently trimmed whitespaces in docs, this is a
> > better one.
>
> After a brief but reasonable discussion with Teodor I've come up with a more
> proper piece of code for phrase operator parsing. The patch is included
> below.

Attached the patch. Sorry for the inconvenience.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
Attachment

Re: [PATCH] Phrase search ported to 9.6

From
Dmitry Ivanov
Date:
There are some previously unnoticed errors in docs (master branch), this patch
fixes them.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
Attachment

Re: [PATCH] Phrase search ported to 9.6

From
Teodor Sigaev
Date:
> There are some previously unnoticed errors in docs (master branch), this patch
> fixes them.
Thank you, pushed

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/