Thread: Prefix operator for text and spgist support

Prefix operator for text and spgist support

From
Ildus Kurbangaliev
Date:
Hi,

Attached patch introduces prefix operator ^@ for text type. For 'a ^@ b'
it returns true if 'a' starts with 'b'. Also there is spgist index
support for this operator.

It could be useful as an alternative for LIKE for 'something%'
templates. Or even used in LIKE queries instead of ~>=~ and ~<~ in the
future. But it would require new strategy for btree.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment

Re: Prefix operator for text and spgist support

From
Arthur Zakirov
Date:
Hello,

On Fri, Feb 02, 2018 at 06:03:27PM +0300, Ildus Kurbangaliev wrote:
> Hi,
> 
> Attached patch introduces prefix operator ^@ for text type. For 'a ^@ b'
> it returns true if 'a' starts with 'b'. Also there is spgist index
> support for this operator.
> 
> It could be useful as an alternative for LIKE for 'something%'
> templates. Or even used in LIKE queries instead of ~>=~ and ~<~ in the
> future. But it would require new strategy for btree.

I've looked at the patch. It is applied and tests pass.

I have a couple comments:

> +    if (ptype == Pattern_Type_Prefix)
> +    {
> +        char    *s = TextDatumGetCString(constval);
> +        prefix = string_to_const(s, vartype);
> +        pstatus = Pattern_Prefix_Partial;
> +        rest_selec = 1.0;    /* all */
> +        pfree(s);
> +    }
> +    else
> +        pstatus = pattern_fixed_prefix(patt, ptype, collation,
> +                                       &prefix, &rest_selec);

I think it is better to put Pattern_Type_Prefix processing into
pattern_fixed_prefix() as another case entry.

Secondly, it is worth to fix the documentation. At least here [1]. Maybe
there are another places where documentation should be fixed.


1 - https://www.postgresql.org/docs/current/static/spgist-builtin-opclasses.html

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Re: Prefix operator for text and spgist support

From
Ildus Kurbangaliev
Date:
On Mon, 19 Feb 2018 15:06:51 +0300
Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:

> Hello,
> 
> On Fri, Feb 02, 2018 at 06:03:27PM +0300, Ildus Kurbangaliev wrote:
> > Hi,
> > 
> > Attached patch introduces prefix operator ^@ for text type. For 'a
> > ^@ b' it returns true if 'a' starts with 'b'. Also there is spgist
> > index support for this operator.
> > 
> > It could be useful as an alternative for LIKE for 'something%'
> > templates. Or even used in LIKE queries instead of ~>=~ and ~<~ in
> > the future. But it would require new strategy for btree.  
> 
> I've looked at the patch. It is applied and tests pass.

Hi, thanks for the review.

> 
> I have a couple comments:
> 
> > +    if (ptype == Pattern_Type_Prefix)
> > +    {
> > +        char    *s = TextDatumGetCString(constval);
> > +        prefix = string_to_const(s, vartype);
> > +        pstatus = Pattern_Prefix_Partial;
> > +        rest_selec = 1.0;    /* all */
> > +        pfree(s);
> > +    }
> > +    else
> > +        pstatus = pattern_fixed_prefix(patt, ptype,
> > collation,
> > +
> > &prefix, &rest_selec);  
> 
> I think it is better to put Pattern_Type_Prefix processing into
> pattern_fixed_prefix() as another case entry.

At brief look at this place seems better to move this block into
pattern_fixed_prefix function. But there is also `vartype` variable
which used to in prefix construction, and it would require pass this
variable too. And since pattern_fixed_prefix called in 10 other places
and vartype is required only for this ptype it seems better just keep
this block outside of this function.

> 
> Secondly, it is worth to fix the documentation. At least here [1].
> Maybe there are another places where documentation should be fixed.
> 
> 
> 1 -
> https://www.postgresql.org/docs/current/static/spgist-builtin-opclasses.html
> 

I've added documentation in current version of the patch.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment

Re: Prefix operator for text and spgist support

From
Arthur Zakirov
Date:
On Mon, Feb 19, 2018 at 05:19:15PM +0300, Ildus Kurbangaliev wrote:
> At brief look at this place seems better to move this block into
> pattern_fixed_prefix function. But there is also `vartype` variable
> which used to in prefix construction, and it would require pass this
> variable too. And since pattern_fixed_prefix called in 10 other places
> and vartype is required only for this ptype it seems better just keep
> this block outside of this function.

Understood.

> I've added documentation in current version of the patch.

Thank you.

Can you rebase the patch due to changes within pg_proc.h?

Also here

+   <para>
+    There is also the prefix operator <literal>^@</literal> and corresponding
+    <literal>text_startswith</literal> function which covers cases when only
+    searching by beginning of the string is needed.
+   </para>

I think text_startswith should be enclosed with the <function> tag. I'm
not sure, but I think <literal> used for operators, keywords, etc. I
haven't found a manual which describes how to use tags, but after looking
at the documentation where <function> is used, I think that for function
<function> should be used.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Re: Prefix operator for text and spgist support

From
Ildus Kurbangaliev
Date:
On Tue, 6 Mar 2018 19:27:21 +0300
Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:

> On Mon, Feb 19, 2018 at 05:19:15PM +0300, Ildus Kurbangaliev wrote:
> > At brief look at this place seems better to move this block into
> > pattern_fixed_prefix function. But there is also `vartype` variable
> > which used to in prefix construction, and it would require pass this
> > variable too. And since pattern_fixed_prefix called in 10 other
> > places and vartype is required only for this ptype it seems better
> > just keep this block outside of this function.  
> 
> Understood.
> 
> > I've added documentation in current version of the patch.  
> 
> Thank you.
> 
> Can you rebase the patch due to changes within pg_proc.h?
> 
> Also here
> 
> +   <para>
> +    There is also the prefix operator <literal>^@</literal> and
> corresponding
> +    <literal>text_startswith</literal> function which covers cases
> when only
> +    searching by beginning of the string is needed.
> +   </para>
> 
> I think text_startswith should be enclosed with the <function> tag.
> I'm not sure, but I think <literal> used for operators, keywords,
> etc. I haven't found a manual which describes how to use tags, but
> after looking at the documentation where <function> is used, I think
> that for function <function> should be used.
> 

Hi, thanks for the review. I've fixed documentation as you said and
also rebased to current master.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment

Re: Prefix operator for text and spgist support

From
Teodor Sigaev
Date:
Hi!

Patch looks resonable, but I see some place to improvement:
spg_text_leaf_consistent() only needs to check with text_startswith() if 
reconstucted value came to leaf consistent is shorter than given prefix. For 
example, if level >= length of prefix then we guarantee that fully reconstracted 
is matched too. But do not miss that you may need to return value for index only 
scan, consult returnData field

In attachment rebased and minorly edited version of your patch.


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Attachment

Re: Prefix operator for text and spgist support

From
Alexander Korotkov
Date:
On Thu, Mar 22, 2018 at 7:09 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
Patch looks resonable, but I see some place to improvement:
spg_text_leaf_consistent() only needs to check with text_startswith() if reconstucted value came to leaf consistent is shorter than given prefix. For example, if level >= length of prefix then we guarantee that fully reconstracted is matched too. But do not miss that you may need to return value for index only scan, consult returnData field

In attachment rebased and minorly edited version of your patch.

I took a look at this patch.  In addition to Teodor's comments I can note following.

* This line looks strange for me.
-			if (strategy > 10)
+			if (strategy > 10 && strategy != RTPrefixStrategyNumber)
It's not because we added strategy != RTPrefixStrategyNumber condition there.
It's because we already used magic number here and now have a mix of magic
number and macro constant in one line.  Once we anyway touch this place,
could we get rid of magic numbers here?

* I'm a little concern about operator name.  We're going to choose @^ operator for
prefix search without any preliminary discussion.  However, personally I don't
have better ideas :)

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: Prefix operator for text and spgist support

From
Ildus Kurbangaliev
Date:
On Fri, 23 Mar 2018 11:45:33 +0300
Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

> On Thu, Mar 22, 2018 at 7:09 PM, Teodor Sigaev <teodor@sigaev.ru>
> wrote:
> 
> > Patch looks resonable, but I see some place to improvement:
> > spg_text_leaf_consistent() only needs to check with
> > text_startswith() if reconstucted value came to leaf consistent is
> > shorter than given prefix. For example, if level >= length of
> > prefix then we guarantee that fully reconstracted is matched too.
> > But do not miss that you may need to return value for index only
> > scan, consult returnData field
> >
> > In attachment rebased and minorly edited version of your patch.  
> 
> 
> I took a look at this patch.  In addition to Teodor's comments I can
> note following.
> 
> * This line looks strange for me.
> 
> -            if (strategy > 10)
> +            if (strategy > 10 && strategy !=
> RTPrefixStrategyNumber)
> 
> It's not because we added strategy != RTPrefixStrategyNumber condition
> there.
> It's because we already used magic number here and now have a mix of
> magic number and macro constant in one line.  Once we anyway touch
> this place, could we get rid of magic numbers here?
> 
> * I'm a little concern about operator name.  We're going to choose @^
> operator for
> prefix search without any preliminary discussion.  However,
> personally I don't
> have better ideas :)

Teodor, Alexander, thanks for review. In new version I have added the
optimization in spgist using level variable and also got rid of magic
numbers.

About the operator it's actually ^@ (not @^ :)), I thought about it and
don't really have any idea what operator can be used instead.

Attached version 5 of the patch.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment

Re: Prefix operator for text and spgist support

From
Teodor Sigaev
Date:
Thank you, pushed with some editorization and renaming text_startswith to 
starts_with

Ildus Kurbangaliev wrote:
> On Fri, 23 Mar 2018 11:45:33 +0300
> Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
> 
>> On Thu, Mar 22, 2018 at 7:09 PM, Teodor Sigaev <teodor@sigaev.ru>
>> wrote:
>>
>>> Patch looks resonable, but I see some place to improvement:
>>> spg_text_leaf_consistent() only needs to check with
>>> text_startswith() if reconstucted value came to leaf consistent is
>>> shorter than given prefix. For example, if level >= length of
>>> prefix then we guarantee that fully reconstracted is matched too.
>>> But do not miss that you may need to return value for index only
>>> scan, consult returnData field
>>>
>>> In attachment rebased and minorly edited version of your patch.
>>
>>
>> I took a look at this patch.  In addition to Teodor's comments I can
>> note following.
>>
>> * This line looks strange for me.
>>
>> -            if (strategy > 10)
>> +            if (strategy > 10 && strategy !=
>> RTPrefixStrategyNumber)
>>
>> It's not because we added strategy != RTPrefixStrategyNumber condition
>> there.
>> It's because we already used magic number here and now have a mix of
>> magic number and macro constant in one line.  Once we anyway touch
>> this place, could we get rid of magic numbers here?
>>
>> * I'm a little concern about operator name.  We're going to choose @^
>> operator for
>> prefix search without any preliminary discussion.  However,
>> personally I don't
>> have better ideas :)
> 
> Teodor, Alexander, thanks for review. In new version I have added the
> optimization in spgist using level variable and also got rid of magic
> numbers.
> 
> About the operator it's actually ^@ (not @^ :)), I thought about it and
> don't really have any idea what operator can be used instead.
> 
> Attached version 5 of the patch.
> 

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


Re: Prefix operator for text and spgist support

From
Emre Hasegeli
Date:
> Thank you, pushed with some editorization and renaming text_startswith to
> starts_with

I am sorry for not noticing this before, but what is the point of this
operator?  It seems to me we are only making the prefix searching
business, which is already complicated, more complicated.

Also, the new operator is not documented on SQL String Functions and
Operators table.  It is not supported by btree text_pattern_ops or
btree indexes with COLLATE "C".  It is not defined for "citext", so
people would get wrong results.  It doesn't use pg_trgm indexes
whereas LIKE can.


Re: Prefix operator for text and spgist support

From
Ildus Kurbangaliev
Date:
On Mon, 16 Apr 2018 12:45:23 +0200
Emre Hasegeli <emre@hasegeli.com> wrote:

> > Thank you, pushed with some editorization and renaming
> > text_startswith to starts_with  
> 
> I am sorry for not noticing this before, but what is the point of this
> operator?  It seems to me we are only making the prefix searching
> business, which is already complicated, more complicated.

Hi.

> 
> Also, the new operator is not documented on SQL String Functions and
> Operators table.  It is not supported by btree text_pattern_ops or
> btree indexes with COLLATE "C".  It is not defined for "citext", so
> people would get wrong results.  It doesn't use pg_trgm indexes
> whereas LIKE can.

It is mentioned in documentation, look for "starts_with" function.
Currently it's working with spgist indexes which fact is pointed out in
the documentation too. I was going to add btree support but it would
require a new strategy so it will be matter of another patch. I think
this operator could be used in LIKE instead of current weird comparison
operators.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company