Thread: Prefix operator for text and spgist support
Hi, Attached patch introduces prefix operator ^@ for text type. For 'a ^@ b' it returns true if 'a' starts with 'b'. Also there is spgist index support for this operator. It could be useful as an alternative for LIKE for 'something%' templates. Or even used in LIKE queries instead of ~>=~ and ~<~ in the future. But it would require new strategy for btree. -- --- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Attachment
Hello, On Fri, Feb 02, 2018 at 06:03:27PM +0300, Ildus Kurbangaliev wrote: > Hi, > > Attached patch introduces prefix operator ^@ for text type. For 'a ^@ b' > it returns true if 'a' starts with 'b'. Also there is spgist index > support for this operator. > > It could be useful as an alternative for LIKE for 'something%' > templates. Or even used in LIKE queries instead of ~>=~ and ~<~ in the > future. But it would require new strategy for btree. I've looked at the patch. It is applied and tests pass. I have a couple comments: > + if (ptype == Pattern_Type_Prefix) > + { > + char *s = TextDatumGetCString(constval); > + prefix = string_to_const(s, vartype); > + pstatus = Pattern_Prefix_Partial; > + rest_selec = 1.0; /* all */ > + pfree(s); > + } > + else > + pstatus = pattern_fixed_prefix(patt, ptype, collation, > + &prefix, &rest_selec); I think it is better to put Pattern_Type_Prefix processing into pattern_fixed_prefix() as another case entry. Secondly, it is worth to fix the documentation. At least here [1]. Maybe there are another places where documentation should be fixed. 1 - https://www.postgresql.org/docs/current/static/spgist-builtin-opclasses.html -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On Mon, 19 Feb 2018 15:06:51 +0300 Arthur Zakirov <a.zakirov@postgrespro.ru> wrote: > Hello, > > On Fri, Feb 02, 2018 at 06:03:27PM +0300, Ildus Kurbangaliev wrote: > > Hi, > > > > Attached patch introduces prefix operator ^@ for text type. For 'a > > ^@ b' it returns true if 'a' starts with 'b'. Also there is spgist > > index support for this operator. > > > > It could be useful as an alternative for LIKE for 'something%' > > templates. Or even used in LIKE queries instead of ~>=~ and ~<~ in > > the future. But it would require new strategy for btree. > > I've looked at the patch. It is applied and tests pass. Hi, thanks for the review. > > I have a couple comments: > > > + if (ptype == Pattern_Type_Prefix) > > + { > > + char *s = TextDatumGetCString(constval); > > + prefix = string_to_const(s, vartype); > > + pstatus = Pattern_Prefix_Partial; > > + rest_selec = 1.0; /* all */ > > + pfree(s); > > + } > > + else > > + pstatus = pattern_fixed_prefix(patt, ptype, > > collation, > > + > > &prefix, &rest_selec); > > I think it is better to put Pattern_Type_Prefix processing into > pattern_fixed_prefix() as another case entry. At brief look at this place seems better to move this block into pattern_fixed_prefix function. But there is also `vartype` variable which used to in prefix construction, and it would require pass this variable too. And since pattern_fixed_prefix called in 10 other places and vartype is required only for this ptype it seems better just keep this block outside of this function. > > Secondly, it is worth to fix the documentation. At least here [1]. > Maybe there are another places where documentation should be fixed. > > > 1 - > https://www.postgresql.org/docs/current/static/spgist-builtin-opclasses.html > I've added documentation in current version of the patch. -- --- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Attachment
On Mon, Feb 19, 2018 at 05:19:15PM +0300, Ildus Kurbangaliev wrote: > At brief look at this place seems better to move this block into > pattern_fixed_prefix function. But there is also `vartype` variable > which used to in prefix construction, and it would require pass this > variable too. And since pattern_fixed_prefix called in 10 other places > and vartype is required only for this ptype it seems better just keep > this block outside of this function. Understood. > I've added documentation in current version of the patch. Thank you. Can you rebase the patch due to changes within pg_proc.h? Also here + <para> + There is also the prefix operator <literal>^@</literal> and corresponding + <literal>text_startswith</literal> function which covers cases when only + searching by beginning of the string is needed. + </para> I think text_startswith should be enclosed with the <function> tag. I'm not sure, but I think <literal> used for operators, keywords, etc. I haven't found a manual which describes how to use tags, but after looking at the documentation where <function> is used, I think that for function <function> should be used. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On Tue, 6 Mar 2018 19:27:21 +0300 Arthur Zakirov <a.zakirov@postgrespro.ru> wrote: > On Mon, Feb 19, 2018 at 05:19:15PM +0300, Ildus Kurbangaliev wrote: > > At brief look at this place seems better to move this block into > > pattern_fixed_prefix function. But there is also `vartype` variable > > which used to in prefix construction, and it would require pass this > > variable too. And since pattern_fixed_prefix called in 10 other > > places and vartype is required only for this ptype it seems better > > just keep this block outside of this function. > > Understood. > > > I've added documentation in current version of the patch. > > Thank you. > > Can you rebase the patch due to changes within pg_proc.h? > > Also here > > + <para> > + There is also the prefix operator <literal>^@</literal> and > corresponding > + <literal>text_startswith</literal> function which covers cases > when only > + searching by beginning of the string is needed. > + </para> > > I think text_startswith should be enclosed with the <function> tag. > I'm not sure, but I think <literal> used for operators, keywords, > etc. I haven't found a manual which describes how to use tags, but > after looking at the documentation where <function> is used, I think > that for function <function> should be used. > Hi, thanks for the review. I've fixed documentation as you said and also rebased to current master. -- --- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Attachment
Hi! Patch looks resonable, but I see some place to improvement: spg_text_leaf_consistent() only needs to check with text_startswith() if reconstucted value came to leaf consistent is shorter than given prefix. For example, if level >= length of prefix then we guarantee that fully reconstracted is matched too. But do not miss that you may need to return value for index only scan, consult returnData field In attachment rebased and minorly edited version of your patch. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
On Thu, Mar 22, 2018 at 7:09 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
Patch looks resonable, but I see some place to improvement:
spg_text_leaf_consistent() only needs to check with text_startswith() if reconstucted value came to leaf consistent is shorter than given prefix. For example, if level >= length of prefix then we guarantee that fully reconstracted is matched too. But do not miss that you may need to return value for index only scan, consult returnData field
In attachment rebased and minorly edited version of your patch.
I took a look at this patch. In addition to Teodor's comments I can note following.
* This line looks strange for me.
- if (strategy > 10) + if (strategy > 10 && strategy != RTPrefixStrategyNumber)It's not because we added strategy != RTPrefixStrategyNumber condition there.
It's because we already used magic number here and now have a mix of magic
number and macro constant in one line. Once we anyway touch this place,
could we get rid of magic numbers here?
* I'm a little concern about operator name. We're going to choose @^ operator for
* I'm a little concern about operator name. We're going to choose @^ operator for
prefix search without any preliminary discussion. However, personally I don't
have better ideas :)
------
------
On Fri, 23 Mar 2018 11:45:33 +0300 Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Thu, Mar 22, 2018 at 7:09 PM, Teodor Sigaev <teodor@sigaev.ru> > wrote: > > > Patch looks resonable, but I see some place to improvement: > > spg_text_leaf_consistent() only needs to check with > > text_startswith() if reconstucted value came to leaf consistent is > > shorter than given prefix. For example, if level >= length of > > prefix then we guarantee that fully reconstracted is matched too. > > But do not miss that you may need to return value for index only > > scan, consult returnData field > > > > In attachment rebased and minorly edited version of your patch. > > > I took a look at this patch. In addition to Teodor's comments I can > note following. > > * This line looks strange for me. > > - if (strategy > 10) > + if (strategy > 10 && strategy != > RTPrefixStrategyNumber) > > It's not because we added strategy != RTPrefixStrategyNumber condition > there. > It's because we already used magic number here and now have a mix of > magic number and macro constant in one line. Once we anyway touch > this place, could we get rid of magic numbers here? > > * I'm a little concern about operator name. We're going to choose @^ > operator for > prefix search without any preliminary discussion. However, > personally I don't > have better ideas :) Teodor, Alexander, thanks for review. In new version I have added the optimization in spgist using level variable and also got rid of magic numbers. About the operator it's actually ^@ (not @^ :)), I thought about it and don't really have any idea what operator can be used instead. Attached version 5 of the patch. -- --- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Attachment
Thank you, pushed with some editorization and renaming text_startswith to starts_with Ildus Kurbangaliev wrote: > On Fri, 23 Mar 2018 11:45:33 +0300 > Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > >> On Thu, Mar 22, 2018 at 7:09 PM, Teodor Sigaev <teodor@sigaev.ru> >> wrote: >> >>> Patch looks resonable, but I see some place to improvement: >>> spg_text_leaf_consistent() only needs to check with >>> text_startswith() if reconstucted value came to leaf consistent is >>> shorter than given prefix. For example, if level >= length of >>> prefix then we guarantee that fully reconstracted is matched too. >>> But do not miss that you may need to return value for index only >>> scan, consult returnData field >>> >>> In attachment rebased and minorly edited version of your patch. >> >> >> I took a look at this patch. In addition to Teodor's comments I can >> note following. >> >> * This line looks strange for me. >> >> - if (strategy > 10) >> + if (strategy > 10 && strategy != >> RTPrefixStrategyNumber) >> >> It's not because we added strategy != RTPrefixStrategyNumber condition >> there. >> It's because we already used magic number here and now have a mix of >> magic number and macro constant in one line. Once we anyway touch >> this place, could we get rid of magic numbers here? >> >> * I'm a little concern about operator name. We're going to choose @^ >> operator for >> prefix search without any preliminary discussion. However, >> personally I don't >> have better ideas :) > > Teodor, Alexander, thanks for review. In new version I have added the > optimization in spgist using level variable and also got rid of magic > numbers. > > About the operator it's actually ^@ (not @^ :)), I thought about it and > don't really have any idea what operator can be used instead. > > Attached version 5 of the patch. > -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> Thank you, pushed with some editorization and renaming text_startswith to > starts_with I am sorry for not noticing this before, but what is the point of this operator? It seems to me we are only making the prefix searching business, which is already complicated, more complicated. Also, the new operator is not documented on SQL String Functions and Operators table. It is not supported by btree text_pattern_ops or btree indexes with COLLATE "C". It is not defined for "citext", so people would get wrong results. It doesn't use pg_trgm indexes whereas LIKE can.
On Mon, 16 Apr 2018 12:45:23 +0200 Emre Hasegeli <emre@hasegeli.com> wrote: > > Thank you, pushed with some editorization and renaming > > text_startswith to starts_with > > I am sorry for not noticing this before, but what is the point of this > operator? It seems to me we are only making the prefix searching > business, which is already complicated, more complicated. Hi. > > Also, the new operator is not documented on SQL String Functions and > Operators table. It is not supported by btree text_pattern_ops or > btree indexes with COLLATE "C". It is not defined for "citext", so > people would get wrong results. It doesn't use pg_trgm indexes > whereas LIKE can. It is mentioned in documentation, look for "starts_with" function. Currently it's working with spgist indexes which fact is pointed out in the documentation too. I was going to add btree support but it would require a new strategy so it will be matter of another patch. I think this operator could be used in LIKE instead of current weird comparison operators. -- --- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com Russian Postgres Company