Thread: [HACKERS] Index usage for elem-contained-by-const-range clauses

[HACKERS] Index usage for elem-contained-by-const-range clauses

From
Pritam Baral
Date:
The topic has been previously discussed[0] on the -performance mailing list,
about four years ago.

In that thread, Tom suggested[0] the planner could be made to "expand
"intcol <@
'x,y'::int4range" into "intcol between x and y", using something similar
to the
index LIKE optimization (ie, the "special operator" stuff in indxpath.c)".

This patch tries to do exactly that. It's not tied to any specific datatype,
and has been tested with both builtin types and custom range types. Most
of the
checking for proper datatypes, operators, and btree index happens before
this
code, so I haven't run into any issues yet in my testing. But I'm not
familiar
enough with the internals to be able to confidently say it can handle
all cases
just yet.

[0]:
https://www.postgresql.org/message-id/flat/9860.1364013108%40sss.pgh.pa.us#9860.1364013108@sss.pgh.pa.us

-- 
#!/usr/bin/env regards
Chhatoi Pritam Baral

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Index usage for elem-contained-by-const-range clauses

From
Robert Haas
Date:
On Thu, Feb 23, 2017 at 4:47 AM, Pritam Baral <pritam@pritambaral.com> wrote:
> The topic has been previously discussed[0] on the -performance mailing list,
> about four years ago.
>
> In that thread, Tom suggested[0] the planner could be made to "expand
> "intcol <@
> 'x,y'::int4range" into "intcol between x and y", using something similar
> to the
> index LIKE optimization (ie, the "special operator" stuff in indxpath.c)".
>
> This patch tries to do exactly that.

Please add your patch to https://commitfest.postgresql.org/ so it
doesn't get overlooked.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Index usage for elem-contained-by-const-range clauses

From
Alexander Korotkov
Date:
Hi, Pritam!

I've assigned to review this patch.

On Thu, Feb 23, 2017 at 2:17 AM, Pritam Baral <pritam@pritambaral.com> wrote:
The topic has been previously discussed[0] on the -performance mailing list,
about four years ago.

In that thread, Tom suggested[0] the planner could be made to "expand
"intcol <@
'x,y'::int4range" into "intcol between x and y", using something similar
to the
index LIKE optimization (ie, the "special operator" stuff in indxpath.c)".

That's cool idea.  But I would say more.  Sometimes it's useful to transform "intcol between x and y" into "intcol <@ 'x,y'::int4range".  btree_gin supports "intcol between x and y" as overlap of "intcol >= x" and "intcol <= y".  That is very inefficient.  But it this clause would be transformed into "intcol <@ 'x,y'::int4range", btree_gin could handle this very efficient.
 

This patch tries to do exactly that. It's not tied to any specific datatype,
and has been tested with both builtin types and custom range types. Most
of the
checking for proper datatypes, operators, and btree index happens before
this
code, so I haven't run into any issues yet in my testing. But I'm not
familiar
enough with the internals to be able to confidently say it can handle
all cases
just yet.

I've tried this patch.  It applies cleanly, but doesn't compile.

indxpath.c:4252:1: error: conflicting types for 'range_elem_contained_quals'
range_elem_contained_quals(Node *leftop, Datum rightop)
^
indxpath.c:192:14: note: previous declaration is here
static List *range_elem_contained_quals(Node *leftop, Oid expr_op, Oid opfamily,
             ^
Could you please recheck that you published right version of patch?
Also, I noticed that patch haven't regression tests.  Some mention of this optimization in docs is also nice to have.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] Index usage for elem-contained-by-const-range clauses

From
Jim Nasby
Date:
On 3/10/17 8:29 AM, Alexander Korotkov wrote:
> That's cool idea.  But I would say more.  Sometimes it's useful to
> transform "intcol between x and y" into "intcol <@ 'x,y'::int4range".
>  btree_gin supports "intcol between x and y" as overlap of "intcol >= x"
> and "intcol <= y".  That is very inefficient.  But it this clause would
> be transformed into "intcol <@ 'x,y'::int4range", btree_gin could handle
> this very efficient.

That's certainly be nice as well, but IMHO it's outside the scope of 
this patch to accomplish that.

BTW, while we're wishing for things... Something else that would be nice 
is if there was a way to do these kind of transforms without hacking the 
backend...

> Also, I noticed that patch haven't regression tests.

BTW, those tests need to pay special attention to inclusive vs exclusive 
bounds.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] Index usage for elem-contained-by-const-range clauses

From
Tom Lane
Date:
Pritam Baral <pritam@pritambaral.com> writes:
> The topic has been previously discussed[0] on the -performance mailing list,
> about four years ago.
> In that thread, Tom suggested[0] the planner could be made to "expand
> "intcol <@
> 'x,y'::int4range" into "intcol between x and y", using something similar
> to the
> index LIKE optimization (ie, the "special operator" stuff in indxpath.c)".

> This patch tries to do exactly that.

I took a quick look through this, and have some thoughts ---

* In match_special_index_operator, there are two switch statements and
you've added a clause to only one of them.  In the second one, you need to
add a check that you're working with a btree index.  I'd expect the patch
as-is to fall over if an "indexkey <@ range" clause were matched to a hash
index.

* You're failing to account for the case of "range @> indexkey", which
really this ought to work for.  That would take a bit of fooling around
with the structure of match_special_index_operator to allow indexkey on
the right, but we've foreseen since the beginning that that would someday
be necessary.  Looks like today is that day.

* The first bit in range_elem_contained_quals will fall over for an
indexkey that is an expression rather than a simple Var.  Probably
you should just be using exprType() instead.  (Not sure about whether
that's sufficient in domain cases, though.)  Or actually, why look at
that at all?  Seems like what you want is to look at the RHS input,
ie the range value, and get the relevant element datatype from it.
That would correspond to what will happen if the <@ operator executes
normally: elem_contained_by_range does not consult the type of its LHS.

* The "return NIL" for an empty range looks pretty dubious.  Even if it
fails to fail altogether, it's not doing what we really want, which is to
signal that the condition cannot succeed so we needn't search the index.
Maybe what we should do in that case is generate an "indexkey = NULL"
qual.

* Likewise, if the range is infinite, you're just returning NIL and that's
leaving something on the table.  Probably worth generating "indexkey IS
NOT NULL" in that case; it's not going to help much in typical usage, but
it would prevent scanning nulls if there are a lot of them in the index.

* elog(ERROR) doesn't return, so stuff like this is not idiomatic:

+            if (opr2oid == InvalidOid)
+            {
+                elog(ERROR, "no <= operator for opfamily %u", opfamily);
+                return NIL;
+            }

It'd be sufficient to do

+            if (opr2oid == InvalidOid)
+                elog(ERROR, "no <= operator for opfamily %u", opfamily);

* You're not bothering to insert any inputcollid into the generated
comparison operator nodes.  I'm not sure why that fails to fall over
for text comparisons (if indeed it does fail ...) but it's wrong.
Use the range type's collation there.

* It's sort of annoying that the whole thing only works for a Const range
value.  A different approach you might think about is to make this work
more like ScalarArrayOp, ie we pass the qual through to btree as-is and
teach relevant functions in access/nbtree/ how to extract index bound
conditions from the range datum at runtime.  That would likely end up
being a significantly larger patch, though, so you might reasonably
conclude it's not worth the effort.
        regards, tom lane



Re: [HACKERS] Index usage for elem-contained-by-const-range clauses

From
Tom Lane
Date:
I wrote:
> * You're not bothering to insert any inputcollid into the generated
> comparison operator nodes.  I'm not sure why that fails to fall over
> for text comparisons (if indeed it does fail ...) but it's wrong.
> Use the range type's collation there.

Oh ... looking at this again, I realize that there's an additional
validity check missing: if the range type's collation doesn't match
the index column's collation, we can't do this optimization at all.
That check probably belongs in match_special_index_operator.
        regards, tom lane



Re: [HACKERS] Index usage for elem-contained-by-const-range clauses

From
Pritam Baral
Date:
On Sunday 12 March 2017 01:58 AM, Jim Nasby wrote:
> On 3/10/17 8:29 AM, Alexander Korotkov wrote:  >> That's cool idea.  But I would say more.  Sometimes it's useful to
>>transform "intcol between x and y" into "intcol <@ 'x,y'::int4range". >>  btree_gin supports "intcol between x and y"
asoverlap of "intcol >= x" >> and "intcol <= y".  That is very inefficient.  But it this clause would >> be transformed
into"intcol <@ 'x,y'::int4range", btree_gin could handle >> this very efficient. > > That's certainly be nice as well,
butIMHO it's outside the scope of this patch to accomplish that.
 

Also, I think btree indexes are more common than btree_gin. The motivation for
this originally came from trying to use the primary key of a large table in a
range search, and the primary key index was the default btree.

Also, this is my first deep dive into Postgres's source code, so I took a few
easy ways out, just to get started. If it's not too complex to get btree_gin to
handle between queries as contained-in-range, I can give it a try.

>  > BTW, while we're wishing for things... Something else that would be nice is if there was a way to do these kind of
transformswithout hacking the backend...
 

Indeed. And this was one of the things Tom said back when a similar discussion
had happened (on the -performance mailing list). But seeing as how it's been
almost four years since then, I decided to go ahead with the backend hacking
anyway.

>  >> Also, I noticed that patch haven't regression tests. > > BTW, those tests need to pay special attention to
inclusivevs exclusive bounds.
 

I will add regression tests, though I do have to get through all of Tom's
suggestions elsewhere in this thread first.

-- 
#!/usr/bin/env regards
Chhatoi Pritam Baral




Re: [HACKERS] Index usage for elem-contained-by-const-range clauses

From
Pritam Baral
Date:
On Friday 10 March 2017 07:59 PM, Alexander Korotkov wrote:
> Hi, Pritam!  > > I've assigned to review this patch. > > On Thu, Feb 23, 2017 at 2:17 AM, Pritam Baral
<pritam@pritambaral.com>wrote: > >     The topic has been previously discussed[0] on the -performance mailing list, >
 about four years ago. > >     In that thread, Tom suggested[0] the planner could be made to "expand >     "intcol <@ >
   'x,y'::int4range" into "intcol between x and y", using something similar >     to the >     index LIKE optimization
(ie,the "special operator" stuff in indxpath.c)". > > > That's cool idea.  But I would say more.  Sometimes it's useful
totransform "intcol between x and y" into "intcol <@ 'x,y'::int4range".  btree_gin supports "intcol between x and y" as
overlapof "intcol >= x" and "intcol <= y".  That is very inefficient.  But it this clause would be transformed into
"intcol<@ 'x,y'::int4range", btree_gin could handle this very efficient. > > > >     This patch tries to do exactly
that.It's not tied to any specific datatype, >     and has  
been tested with both builtin types and custom range types. Most >     of the >     checking for proper datatypes,
operators,and btree index happens before >     this >     code, so I haven't run into any issues yet in my testing. But
I'mnot >     familiar >     enough with the internals to be able to confidently say it can handle >     all cases >
justyet. > > > I've tried this patch.  It applies cleanly, but doesn't compile. > > indxpath.c:4252:1: error:
conflictingtypes for 'range_elem_contained_quals' > range_elem_contained_quals(Node *leftop, Datum rightop) > ^ >
indxpath.c:192:14:note: previous declaration is here > static List *range_elem_contained_quals(Node *leftop, Oid
expr_op,Oid opfamily, >              ^ > Could you please recheck that you published right version of patch? 

So sorry. I'm attaching the correct version of the original with this,
in case you want to test the limited implementation, because I still
have to go through Tom's list of suggestions.

BTW, the patch is for applying on top of REL9_6_2, and while I
suspect it may work on master too, I haven't tested it since the
original submission (Feb 23).

> Also, I noticed that patch haven't regression tests.  Some mention of this optimization in docs is also nice to have.
> > ------ > Alexander Korotkov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company 

--
#!/usr/bin/env regards
Chhatoi Pritam Baral


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: Index usage for elem-contained-by-const-range clauses

From
David Steele
Date:
Hi Pritam,

On 3/17/17 5:41 PM, Pritam Baral wrote:
>
> So sorry. I'm attaching the correct version of the original with this,
> in case you want to test the limited implementation, because I still
> have to go through Tom's list of suggestions.
>
> BTW, the patch is for applying on top of REL9_6_2, and while I
> suspect it may work on master too, I haven't tested it since the
> original submission (Feb 23).
>
>> Also, I noticed that patch haven't regression tests.  Some mention of
>> this optimization in docs is also nice to have.  > > ------ >
>> Alexander Korotkov > Postgres Professional: http://www.postgrespro.com
>> > The Russian Postgres Company

This thread has been idle for a week.  Please respond and/or post a new 
patch by 2017-03-28 00:00 AoE (UTC-12) or this submission will be marked 
"Returned with Feedback".

Thanks,
-- 
-David
david@pgmasters.net



Re: Index usage for elem-contained-by-const-range clauses

From
Alexander Korotkov
Date:
On Sat, Mar 18, 2017 at 12:41 AM, Pritam Baral <pritam@pritambaral.com> wrote:
On Friday 10 March 2017 07:59 PM, Alexander Korotkov wrote:
Hi, Pritam!  > > I've assigned to review this patch. > > On Thu, Feb 23, 2017 at 2:17 AM, Pritam Baral <pritam@pritambaral.com> wrote: > >     The topic has been previously discussed[0] on the -performance mailing list, >     about four years ago. > >     In that thread, Tom suggested[0] the planner could be made to "expand >     "intcol <@ >     'x,y'::int4range" into "intcol between x and y", using something similar >     to the >     index LIKE optimization (ie, the "special operator" stuff in indxpath.c)". > > > That's cool idea.  But I would say more.  Sometimes it's useful to transform "intcol between x and y" into "intcol <@ 'x,y'::int4range".  btree_gin supports "intcol between x and y" as overlap of "intcol >= x" and "intcol <= y".  That is very inefficient.  But it this clause would be transformed into "intcol <@ 'x,y'::int4range", btree_gin could handle this very efficient. > > > >     This patch tries to do exactly that. It's not tied to any specific datatype, >     and has
been tested with both builtin types and custom range types. Most >     of the >     checking for proper datatypes, operators, and btree index happens before >     this >     code, so I haven't run into any issues yet in my testing. But I'm not >     familiar >     enough with the internals to be able to confidently say it can handle >     all cases >     just yet. > > > I've tried this patch.  It applies cleanly, but doesn't compile. > > indxpath.c:4252:1: error: conflicting types for 'range_elem_contained_quals' > range_elem_contained_quals(Node *leftop, Datum rightop) > ^ > indxpath.c:192:14: note: previous declaration is here > static List *range_elem_contained_quals(Node *leftop, Oid expr_op, Oid opfamily, >              ^ > Could you please recheck that you published right version of patch?

So sorry. I'm attaching the correct version of the original with this,
in case you want to test the limited implementation, because I still
have to go through Tom's list of suggestions.

BTW, the patch is for applying on top of REL9_6_2, and while I
suspect it may work on master too, I haven't tested it since the
original submission (Feb 23).

What is idea behind basing patch on the REL9_6_2?
This patch implements new functionality and it's definitely not going to be considered to be committed to stable release branches.
If you are interesting in committing this patch to master, please rebase it on master branch.  If not, please clarify the purpose of this submission.

Also, please include some numbering to the patch name, so that we could distinguish one version of patch from another.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Index usage for elem-contained-by-const-range clauses

From
David Steele
Date:
On 3/24/17 10:50 AM, David Steele wrote:
> Hi Pritam,
>
> On 3/17/17 5:41 PM, Pritam Baral wrote:
>>
>> So sorry. I'm attaching the correct version of the original with this,
>> in case you want to test the limited implementation, because I still
>> have to go through Tom's list of suggestions.
>>
>> BTW, the patch is for applying on top of REL9_6_2, and while I
>> suspect it may work on master too, I haven't tested it since the
>> original submission (Feb 23).
>>
>>> Also, I noticed that patch haven't regression tests.  Some mention of
>>> this optimization in docs is also nice to have.  > > ------ >
>>> Alexander Korotkov > Postgres Professional: http://www.postgrespro.com
>>> > The Russian Postgres Company
>
> This thread has been idle for a week.  Please respond and/or post a new
> patch by 2017-03-28 00:00 AoE (UTC-12) or this submission will be marked
> "Returned with Feedback".

This submission has been marked "Returned with Feedback".  Please feel 
free to resubmit to a future commitfest.

Regards,
-- 
-David
david@pgmasters.net