Thread: RTREE on points

RTREE on points

From
"Julian Scarfe"
Date:
Am I missing the point (no pun intended ;-) of RTREE indices?

I was expecting a "point_ops" opclass or similar...

[7.1 on RedHat 6.2]

SELECT am.amname AS acc_name,       opc.opcname AS ops_name,       COUNT(*)    FROM pg_am am, pg_amop amop,
pg_opclassopc    WHERE amop.amopid = am.oid AND          amop.amopclaid = opc.oid AND  am.amname = 'rtree'    GROUP BY
am.amname,opc.opcname    ORDER BY acc_name, ops_name;
 
acc_name |  ops_name  | count
----------+------------+-------rtree    | bigbox_ops |     8rtree    | box_ops    |     8rtree    | poly_ops   |     8
(3 rows)

Surely the most natural application of an RTREE is to index points, as well
as boxes and polygons. E.g.


CREATE TABLE "nodes" (       "node" point,       "node_name" character varying(30)
);
CREATE
INSERT INTO nodes VALUES ('(1,1)', 'a');
INSERT 207372 1
INSERT INTO nodes VALUES ('(1,2)', 'b');
INSERT 207373 1
INSERT INTO nodes VALUES ('(3,2)', 'c');
INSERT 207374 1
INSERT INTO nodes VALUES ('(5,4)', 'd');
INSERT 207375 1
INSERT INTO nodes VALUES ('(7,8)', 'e');
INSERT 207376 1
INSERT INTO nodes VALUES ('(11,10)', 'f');
INSERT 207377 1
INSERT INTO nodes VALUES ('(101,11)', 'g');
INSERT 207378 1

explain select * from nodes where node @ '((1,1),(3,3))'::box;
NOTICE:  QUERY PLAN:
Seq Scan on nodes  (cost=0.00..22.50 rows=500 width=28)

So create an RTREE index to help...but predictably:

CREATE INDEX test_rtree ON nodes USING RTREE (node);
ERROR:  DefineIndex: type point has no default operator class

I can do something like:

CREATE INDEX test_rtree ON nodes USING RTREE (box(node,node));
CREATE

but then:

explain select * from nodes where node @ '((1,1),(3,3))'::box;
NOTICE:  QUERY PLAN:
Seq Scan on nodes  (cost=0.00..1.09 rows=4 width=28)

and even:

explain select * from nodes where box(node,node) @ '((1,1),(3,3))'::box;
NOTICE:  QUERY PLAN:
Seq Scan on nodes  (cost=0.00..1.10 rows=1 width=28)

Thanks for any help

Julian Scarfe




Re: RTREE on points

From
Jeff Hoffmann
Date:
Julian Scarfe wrote:
> 
> explain select * from nodes where box(node,node) @ '((1,1),(3,3))'::box;
> NOTICE:  QUERY PLAN:
> Seq Scan on nodes  (cost=0.00..1.10 rows=1 width=28)
> 

this should work, assuming you have enough points to make a difference
(in the optimizer's mind, at least).  the optimizer still doesn't do a
great job of knowing when it's best to use an index, although, in your
sample, there's no way it would ever be cheaper to use an index. 
there's simply not enough data there.  you can test to see if an index
can be used by a query by shutting off the sequential scans (set
enable_seqscan=off) and retrying the query.  essentially, this forces it
to use an index scan if at all possible.

-- 

Jeff Hoffmann
PropertyKey.com


Re: RTREE on points

From
Jeff Hoffmann
Date:
Julian Scarfe wrote:
> 
> It hadn't occured to me that the index would simply not be used and I'm
> grateful for the pointer to the appropriate variable.

i wouldn't recommend turning off sequential scans for day-to-day usage,
but it certainly can be useful for debugging and  testing.  if you have
specific queries that you need optimized, you can do that, but the whole
point of cost estimates is to give a good estimate of what a normal
query would return, so if you have a lot of ad-hoc queries, it's
probably better to just trust the system.     
> Nevertheless, wouldn't...
> 
> CREATE INDEX test_rtree ON nodes USING RTREE (node);
> (which fails)
> 
> ...be a lot simpler than...
> 
> CREATE INDEX test_rtree ON nodes USING RTREE (box(node,node));
> (which succeeds, as above)
> 
> ?

yes, it does seem like a little more work, but there doesn't seem to be
a lot of usage of the geometric functions by the developers to look at
missing features -- they're mostly just reactive to problems.  i really
have never dug into tweaking access methods to get this to work, but i
would imagine it's not that hard to implement.

-- 

Jeff Hoffmann
PropertyKey.com


Re: RTREE on points

From
Tom Lane
Date:
Jeff Hoffmann <jeff@propertykey.com> writes:
> yes, it does seem like a little more work, but there doesn't seem to be
> a lot of usage of the geometric functions by the developers to look at
> missing features -- they're mostly just reactive to problems.

Jeff's correct, none of the core developers have much time to spend on
adding features to the geometric datatypes.  If someone else wants to
step up to the plate, though, contributions are welcome ;-)

The procedure for adding a new index opclass is somewhat documented for
btree opclasses.  To add an rtree opclass you'd be adding a different
set of operators and support functions, which set isn't documented
anywhere that I know of; you'd have to look at the existing examples
to figure out what is needed.

BTW, you should also look at the GIST stuff and figure out whether
it might not be better to develop a GIST opclass instead of rtree.
In the long run I suspect GIST will be better supported than rtree,
since it's more general.
        regards, tom lane


Re: RTREE on points

From
"Julian Scarfe"
Date:
Julian Scarfe wrote:
> >
> > explain select * from nodes where box(node,node) @ '((1,1),(3,3))'::box;
> > NOTICE:  QUERY PLAN:
> > Seq Scan on nodes  (cost=0.00..1.10 rows=1 width=28)

From: "Jeff Hoffmann" <jeff@propertykey.com>

> this should work, assuming you have enough points to make a difference
> (in the optimizer's mind, at least).  the optimizer still doesn't do a
> great job of knowing when it's best to use an index, although, in your
> sample, there's no way it would ever be cheaper to use an index.
> there's simply not enough data there.  you can test to see if an index
> can be used by a query by shutting off the sequential scans (set
> enable_seqscan=off) and retrying the query.  essentially, this forces it
> to use an index scan if at all possible.

And indeed it does, thank you, Jeff:

# set enable_seqscan=off;
SET VARIABLE
# explain select * from nodes where box(node,node) @ '((1,1),(3,3))'::box;
NOTICE:  QUERY PLAN:
Index Scan using test_rtree on nodes  (cost=0.00..2.02 rows=1 width=28)

It hadn't occured to me that the index would simply not be used and I'm
grateful for the pointer to the appropriate variable.

Nevertheless, wouldn't...

CREATE INDEX test_rtree ON nodes USING RTREE (node);
(which fails)

...be a lot simpler than...

CREATE INDEX test_rtree ON nodes USING RTREE (box(node,node));
(which succeeds, as above)

?

The latter feels contorted and possibly inefficient.  After all, I don't
do...:
CREATE TABLE "nodes" (       "node" point,       "node_name" character varying(30)
);

INSERT INTO nodes VALUES ('(1,1)', 'a');
INSERT INTO nodes VALUES ('(1,2)', 'b');
INSERT INTO nodes VALUES ('(3,2)', 'c');
INSERT INTO nodes VALUES ('(5,4)', 'd');
INSERT INTO nodes VALUES ('(7,8)', 'e');
INSERT INTO nodes VALUES ('(11,10)', 'f');
INSERT INTO nodes VALUES ('(101,11)', 'g');

CREATE INDEX test_btree ON nodes USING BTREE (textcat(node_name,node_name));

...if I want to index by name? (even though in principle it would work)

Thanks for any guidance.

Julian Scarfe




Re: RTREE on points

From
Jeff Hoffmann
Date:
Tom Lane wrote:

> BTW, you should also look at the GIST stuff and figure out whether
> it might not be better to develop a GIST opclass instead of rtree.
> In the long run I suspect GIST will be better supported than rtree,
> since it's more general.
> 
>                         regards, tom lane

are there any built-in GIST opclasses?  i didn't see any when looking
through the system tables.  weren't there things like gist_box_ops &
gist_poly_ops at one time?  i know there are a couple of GiST examples
in contrib (seg, cube & intarray), but i thought there used to be at
least a gist_box_ops.  or was that another contrib item that got
dropped? 

-- 

Jeff Hoffmann
PropertyKey.com


Re: RTREE on points

From
Jeff Hoffmann
Date:
Tom Lane wrote:
> 
> I don't recall any such thing having been removed, but it does seem
> peculiar that there are no GIST opclasses in the standard distribution.
> How the heck did the GIST index code get developed/tested without some
> opclasses?

doing some digging at berkeley, i found the original pggist patch file
that created the gist access method & gist_box_ops opclass (among
others).  i'm assuming that patch was the basis for what was originally
introduced, so i don't know why it didn't get included with everything
else.  it looks like there are a lot of calls to internal postgresql box
comparison functions that would need to get converted to the new calling
convention, but it should be pretty straightforward to get it to work
with a recent version of postgresql.   it does seem pretty silly to have
it in there if you don't have any built-in way of using it, if for no
other reason than to be able to test if the feature even works. 

-- 

Jeff Hoffmann
PropertyKey.com


Re: RTREE on points

From
Tom Lane
Date:
Jeff Hoffmann <jeff@propertykey.com> writes:
> I know there are a couple of GiST examples in contrib (seg, cube &
> intarray), but i thought there used to be at least a gist_box_ops.

I don't recall any such thing having been removed, but it does seem
peculiar that there are no GIST opclasses in the standard distribution.
How the heck did the GIST index code get developed/tested without some
opclasses?

Anyone remember the history?  AFAICT from the CVS logs, GIST was added
to the tree in mid 1996, but no opclasses for it were added at the time.
I'd go digging in the maillist archives, but www.postgresql.org is too
friggin' slow at the moment ...
        regards, tom lane


Re: RTREE on points

From
Tom Lane
Date:
Jeff Hoffmann <jeff@propertykey.com> writes:
> Tom Lane wrote:
>> How the heck did the GIST index code get developed/tested without some
>> opclasses?

> doing some digging at berkeley, i found the original pggist patch file
> that created the gist access method & gist_box_ops opclass (among
> others).  i'm assuming that patch was the basis for what was originally
> introduced, so i don't know why it didn't get included with everything
> else.

The CVS logs show there was some confusion about applying the whole of
the pggist patch, so it's possible that this was an unintentional
omission.

> it looks like there are a lot of calls to internal postgresql box
> comparison functions that would need to get converted to the new calling
> convention, but it should be pretty straightforward to get it to work
> with a recent version of postgresql.  it does seem pretty silly to have
> it in there if you don't have any built-in way of using it, if for no
> other reason than to be able to test if the feature even works. 

Yes.  I had been thinking of migrating one or more of the new contrib
GIST opclasses into the main distribution for 7.2, if only to be able to
create a regress test for GIST.  Updating this old Berkeley code would
be another path to create/extend a GIST regress test.  If you (or
someone else) have time to do that, it'd be great.
        regards, tom lane


Re: RTREE on points

From
Oleg Bartunov
Date:
GiST is great !

You may look at http://www.sai.msu.su/~megera/postgres/gist/
for GiST implementation of RTree - it could be not compiled with 7.1
release due to some api changes, but it's not difficult to do.
If somebody want it I could contribute it to contrib area.

Tom,
what we need to do to improve GiST support is to provide selectivity
information of gist indices to optimizer. I'll post separate message
about this problem
Oleg
On Tue, 17 Apr 2001, Jeff Hoffmann wrote:

> Tom Lane wrote:
> >
> > I don't recall any such thing having been removed, but it does seem
> > peculiar that there are no GIST opclasses in the standard distribution.
> > How the heck did the GIST index code get developed/tested without some
> > opclasses?
>
> doing some digging at berkeley, i found the original pggist patch file
> that created the gist access method & gist_box_ops opclass (among
> others).  i'm assuming that patch was the basis for what was originally
> introduced, so i don't know why it didn't get included with everything
> else.  it looks like there are a lot of calls to internal postgresql box
> comparison functions that would need to get converted to the new calling
> convention, but it should be pretty straightforward to get it to work
> with a recent version of postgresql.   it does seem pretty silly to have
> it in there if you don't have any built-in way of using it, if for no
> other reason than to be able to test if the feature even works.
>
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: RTREE on points

From
Jeff Hoffmann
Date:
Oleg Bartunov wrote:
> 
> GiST is great !
> 
> You may look at http://www.sai.msu.su/~megera/postgres/gist/
> for GiST implementation of RTree - it could be not compiled with 7.1
> release due to some api changes, but it's not difficult to do.

it looks like i just wasted a good couple of hours trying to convert the
gist_box_ops.  it did help find the pointer problem i was having because
i'm still not up to speed on the new function calling conventions,
though...

> If somebody want it I could contribute it to contrib area. 

i'm definitely interested.  i'm going to play with it & if oleg's claim
holds about index insertion time holds, i can definitely see myself
moving to it over the built in rtree.  anything that can cut down the
hours of index creation time would be great.  also, it seems that it'd
be a good choice for inclusion in the standard distribution because it'd
be easy to test -- you already have to run rtree tests anyway, you can
just duplicate them with gist & gist_box_ops.

-- 

Jeff Hoffmann
PropertyKey.com


Re: RTREE on points

From
Oleg Bartunov
Date:
On Tue, 17 Apr 2001, Jeff Hoffmann wrote:

> Oleg Bartunov wrote:
> >
> > GiST is great !
> >
> > You may look at http://www.sai.msu.su/~megera/postgres/gist/
> > for GiST implementation of RTree - it could be not compiled with 7.1
> > release due to some api changes, but it's not difficult to do.
>
> it looks like i just wasted a good couple of hours trying to convert the
> gist_box_ops.  it did help find the pointer problem i was having because
> i'm still not up to speed on the new function calling conventions,
> though...
>

:-)

> > If somebody want it I could contribute it to contrib area.
>
> i'm definitely interested.  i'm going to play with it & if oleg's claim
> holds about index insertion time holds, i can definitely see myself
> moving to it over the built in rtree.  anything that can cut down the
> hours of index creation time would be great.  also, it seems that it'd
> be a good choice for inclusion in the standard distribution because it'd
> be easy to test -- you already have to run rtree tests anyway, you can
> just duplicate them with gist & gist_box_ops.
>

nice you noticed that ! We'll update contrib-rtree_box_gist for 7.1 release
in a few days.

>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: RTREE on points

From
Oleg Bartunov
Date:
Jeff,

I checked the archive and it works with 7.1 release
we've implemented only several functions for box, so you may use
them as example for remaining types.
Regards,    Oleg
On Wed, 18 Apr 2001, Oleg Bartunov wrote:

> On Tue, 17 Apr 2001, Jeff Hoffmann wrote:
>
> > Oleg Bartunov wrote:
> > >
> > > GiST is great !
> > >
> > > You may look at http://www.sai.msu.su/~megera/postgres/gist/
> > > for GiST implementation of RTree - it could be not compiled with 7.1
> > > release due to some api changes, but it's not difficult to do.
> >
> > it looks like i just wasted a good couple of hours trying to convert the
> > gist_box_ops.  it did help find the pointer problem i was having because
> > i'm still not up to speed on the new function calling conventions,
> > though...
> >
>
> :-)
>
> > > If somebody want it I could contribute it to contrib area.
> >
> > i'm definitely interested.  i'm going to play with it & if oleg's claim
> > holds about index insertion time holds, i can definitely see myself
> > moving to it over the built in rtree.  anything that can cut down the
> > hours of index creation time would be great.  also, it seems that it'd
> > be a good choice for inclusion in the standard distribution because it'd
> > be easy to test -- you already have to run rtree tests anyway, you can
> > just duplicate them with gist & gist_box_ops.
> >
>
> nice you noticed that ! We'll update contrib-rtree_box_gist for 7.1 release
> in a few days.
>
> >
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://www.postgresql.org/search.mpl
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: RTREE on pointsy

From
Bruce Momjian
Date:
I believe Marc install the GIST code into the tree long ago.

> Jeff Hoffmann <jeff@propertykey.com> writes:
> > I know there are a couple of GiST examples in contrib (seg, cube &
> > intarray), but i thought there used to be at least a gist_box_ops.
> 
> I don't recall any such thing having been removed, but it does seem
> peculiar that there are no GIST opclasses in the standard distribution.
> How the heck did the GIST index code get developed/tested without some
> opclasses?
> 
> Anyone remember the history?  AFAICT from the CVS logs, GIST was added
> to the tree in mid 1996, but no opclasses for it were added at the time.
> I'd go digging in the maillist archives, but www.postgresql.org is too
> friggin' slow at the moment ...
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026