Thread: contrib/rtree_gist into core system?

contrib/rtree_gist into core system?

From
Tom Lane
Date:
I've mentioned this a couple times now, but here is an official
proposal:

I would like to move the contrib/rtree_gist functionality into the core
system before feature freeze.  There are a couple arguments in favor:

* It's way past time GiST indexes got some regression testing in the
core build.  rtree_gist is the only very plausible candidate for
near-term integration: I don't think tsearch2 is ready for that, and
the other GiST-using contrib modules are not of sufficient general
usefulness.

* With the recent WAL-ization and hoped-for concurrency fixes, GiST
is definitely superior to R-tree for practical use.  I don't see the
percentage in continuing to maintain the R-tree code indefinitely.
By integrating the opclasses needed to replace R-tree, we can start
down the path to deprecating and eventually removing R-tree.

Any objections?
        regards, tom lane


Re: contrib/rtree_gist into core system?

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> * With the recent WAL-ization and hoped-for concurrency fixes, GiST
> is definitely superior to R-tree for practical use.  I don't see the
> percentage in continuing to maintain the R-tree code indefinitely.
> By integrating the opclasses needed to replace R-tree, we can start
> down the path to deprecating and eventually removing R-tree.

I think we still have a serious problem with multicolumn indexes. As they
stand they're basically only indexes on the first column. The later columns
are not used to determine page splits.

Also, isn't rtree still substantially faster than gist? Like I think rtree is
on the order of 2x as fast as rtree_gist. Perhaps the concurrency fixes will
improve this.


-- 
greg



Re: contrib/rtree_gist into core system?

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> * With the recent WAL-ization and hoped-for concurrency fixes, GiST
>> is definitely superior to R-tree for practical use.  I don't see the
>> percentage in continuing to maintain the R-tree code indefinitely.
>> By integrating the opclasses needed to replace R-tree, we can start
>> down the path to deprecating and eventually removing R-tree.

> I think we still have a serious problem with multicolumn indexes. As they
> stand they're basically only indexes on the first column. The later columns
> are not used to determine page splits.

R-tree doesn't do multicolumn at all, so this is is hardly an argument
for keeping it, is it?

> Also, isn't rtree still substantially faster than gist?

Not according to contrib/rtree_gist/bench/, though I admit I have not
bothered to reproduce the experiment.
        regards, tom lane


Re: contrib/rtree_gist into core system?

From
Christopher Kings-Lynne
Date:
>>Also, isn't rtree still substantially faster than gist?
> 
> Not according to contrib/rtree_gist/bench/, though I admit I have not
> bothered to reproduce the experiment.

Will you just remove rtree and make rtree indexes use rtree_gist instead?

Chris



Re: contrib/rtree_gist into core system?

From
Tom Lane
Date:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> Will you just remove rtree and make rtree indexes use rtree_gist instead?

That is in the back of my mind for 8.2 or so; I thought it'd be a bit
premature to propose it for 8.1, though.

The real bottom line here is that any devel effort we might put into
rtree would probably be better spent on gist.  We have limited resources
and we can't afford to waste 'em on code that doesn't have a clear claim
to fame.  As a contrasting example, I am surely not proposing that we
drop btree in favor of btree_gist ;-) ... but rtree has always been
marginal, and it's very hard to see where it can win over gist.
        regards, tom lane


Re: contrib/rtree_gist into core system?

From
"John Hansen"
Date:
Tom Lane Wrote:

> ... but rtree has always
> been marginal, and it's very hard to see where it can win over gist.

Simplicity!

Implementing rtree operators and support functions is FAR simpler than
implementing the GiST equivalents.

For example, suppose all you want to implement is the ~ operator for a
custom type, then technically all you need is 4 functions (well, 5
including the stub operators)

bool contains(type,type);
type intersect(type,type);
type union(type,type);
void size(type,*float);

And the 6 other operators simply defined as:
bool false(type) { return false; }

For GiST you still need 7 support functions + the operator function,
some of which aren't exactly simple to implement, the picksplit for
instance.

So I'd not recommend getting rid of rtree just yet. At least not until
someone has written an extensive howto on the subject of GiST
implementation.

... John


Re: contrib/rtree_gist into core system?

From
Tom Lane
Date:
"John Hansen" <john@geeknet.com.au> writes:
> Tom Lane Wrote:
>> ... but rtree has always 
>> been marginal, and it's very hard to see where it can win over gist.

> Simplicity!
> Implementing rtree operators and support functions is FAR simpler than
> implementing the GiST equivalents.

Mmm ... not really.  It does seem that we could offer some sort of
generic set of gist support routines that understand rtree-like
semantics (all the picksplit routines seem suspiciously similar,
for instance).  But seeing that every known instance of rtree support
has been broken since day one, partly for lack of any documentation
about what was needed, it's a bit hard to claim that implementing rtree
is transparently trivial.

> So I'd not recommend getting rid of rtree just yet. At least not until
> someone has written an extensive howto on the subject of GiST
> implementation.

There's no HOWTO for rtree either.  Again, my point is not that one
couldn't be written; it's that we would probably be better off spending
the effort on a HOWTO for gist.
        regards, tom lane


Re: contrib/rtree_gist into core system?

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> "John Hansen" <john@geeknet.com.au> writes:
>
> > Simplicity!
> > Implementing rtree operators and support functions is FAR simpler than
> > implementing the GiST equivalents.
> 
> Mmm ... not really.  It does seem that we could offer some sort of
> generic set of gist support routines that understand rtree-like
> semantics (all the picksplit routines seem suspiciously similar,
> for instance).  

I think the picksplit part of the API is the strangest part. Normally when you
design data types you think in terms of your data type and the operations on
it. You shouldn't suddenly have to immerse yourself in some nitty gritty ADTs
for index pages.

I believe all the picksplit functions are based on (apparently via copy/paste)
a single algorithm that depends on a single operator: a kind of "distance"
function. Usually it's the same function underlying the penalty gist api
function. That's a natural operator to implement for data types and one that
doesn't involve learning about any extraneous implementation details of gist
indexing.

If gist index operator classes could be activated based entirely on data type
operators like "distance" and "union" and "penalty" instead of this strange
"picksplit" function then it would make implementing them *much* easier.

Moreover it would solve the multicolumn index issue. There are a probably
other options but the simplest would be to simply take the inner product of
the results of the various distance functions.

-- 
greg



Re: contrib/rtree_gist into core system?

From
Andrew - Supernews
Date:
On 2005-06-27, Greg Stark <gsstark@mit.edu> wrote:
> I believe all the picksplit functions are based on (apparently via
> copy/paste) a single algorithm that depends on a single operator: a kind
> of "distance" function. Usually it's the same function underlying the
> penalty gist api function.

That's not quite true. There are at least two quite different picksplit
algorithms in those of the contrib/* modules that I've studied, and in
general I do not think it is possible to provide a single generic
picksplit that will work efficiently for _all_ data types. (And it is of
course important not to constrain the types of data that are allowed...)

It might be reasonable to implement a "default" picksplit based on a
user-supplied metric function (_not_ the same metric as "penalty"). But
I think there always needs to be scope for the user to provide their own
split function.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


Re: contrib/rtree_gist into core system?

From
Teodor Sigaev
Date:
We can make r-tree as contrib module and then we will have example of index in 
contrib...

> By integrating the opclasses needed to replace R-tree, we can start
> down the path to deprecating and eventually removing R-tree.
> 

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: contrib/rtree_gist into core system?

From
Teodor Sigaev
Date:
> I think we still have a serious problem with multicolumn indexes. As they
> stand they're basically only indexes on the first column. The later columns
> are not used to determine page splits.

It's not a fully truth, second keys can be used in split, if first columns has 
non-unique values and second, the later columns uses in gistchoose method (wrap 
for user-defined penalty methods).

But I am agreed, that split in multicolumn GiST indexes isn't very optimal, the 
solution was suggested by Aoki, but it's require to change interface to user 
function.

Look:
"Generalizing ''Search'' in Generalized Search Trees", 1997, Paul M.
Aoki,http://www.sai.msu.su/~megera/postgres/gist/papers/csd-97-950.pdf



-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: contrib/rtree_gist into core system?

From
Teodor Sigaev
Date:
> I believe all the picksplit functions are based on (apparently via copy/paste)
> a single algorithm that depends on a single operator: a kind of "distance"
> function. Usually it's the same function underlying the penalty gist api

You are wrong, at least now in contrib it used three basic picksplit algoritm
1 simple sorting for ordered domain( btree_gist, ltree )
2 several variations of Guttmans algorithm (tsearch2, intarray, seg, cube)
3 linear picksplit for rtree_gist 
(http://www.sai.msu.su/~megera/postgres/gist/papers/nsplitLN.ps.gz).



-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: contrib/rtree_gist into core system?

From
Teodor Sigaev
Date:
FYI, compress and decompress methods may be trivial.


> 
> For GiST you still need 7 support functions + the operator function,
> some of which aren't exactly simple to implement, the picksplit for
> instance.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: contrib/rtree_gist into core system?

From
"John Hansen"
Date:
Tom Lane [mailto:tgl@sss.pgh.pa.us] Wrote:

> There's no HOWTO for rtree either.  Again, my point is not
> that one couldn't be written; it's that we would probably be
> better off spending the effort on a HOWTO for gist.

No, but the _current_ implementation of the rtree operators are ver much
self explaining and need no howto.

Union(x,y) = x + y
Intersect(x,y) = the values that are present in both x and y, or
_overlapping_region_
Size(x) = the size of the area/length of the line, number of elements,
etc...

Now, how simple is that compared to gist?

I for one, is yet to produce a working example of something as simple as
indexing an array of 2 elements [x y] represented by a custom type as
'[x y]' in string format (returned by type_out) internally stored as a
char[2], so that I can fetch all rows where [x y] = ':y' (:y meaning 2nd
element in array, x: meaning first element in array.

I chose this as something simple to play with, having no practical
application for me, but to get an understanding of gist,.... For now,. I
have put it in the too hard basket.

I did however in about half a day implement rtree support for inet/cidr
(ipv4 only) as you might recall.

Kind Regards,

John



Re: contrib/rtree_gist into core system?

From
Tom Lane
Date:
"John Hansen" <john@geeknet.com.au> writes:
> No, but the _current_ implementation of the rtree operators are ver much
> self explaining and need no howto.

That reasoning no doubt explains why we don't have *any* rtree-like
opclasses that got the left/overleft/right/overright semantics right
the first time :-(.

Greg Stark is right that the GIST API could probably be simpler ---
in particular it would be interesting to see if we could offer a
default picksplit function that most opclasses could use.  But that
doesn't mean that the rtree API is exactly trivial.
        regards, tom lane


Re: contrib/rtree_gist into core system?

From
Teodor Sigaev
Date:
> I think we still have a serious problem with multicolumn indexes. As they
> stand they're basically only indexes on the first column. The later columns
> are not used to determine page splits.

1. In your meaning, btree has bad split algorithm too. Look at _bt_compare, if 
first keys on page are unique the the later keys will not be compared ;)

2. About rtree interface: it's possible to write GiST-RTree layer compatibility 
interface. User's interface will just copied from RTree, and layer will 
translate it to GiST interface.


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: contrib/rtree_gist into core system?

From
Greg Stark
Date:
Teodor Sigaev <teodor@sigaev.ru> writes:

> 1. In your meaning, btree has bad split algorithm too. Look at _bt_compare, if
> first keys on page are unique the the later keys will not be compared ;)

I'm confused now. I was under the impression that gist didn't look at
subsequent columns if the first column was identical.

Maybe I got it backwards and it's insert that fails to look at subsequent
columns but page split handles it ok? So if you insert lots of tuples with
identical first columns the page will be split intelligently but then
subsequent inserts will be distributed essentially randomly between the two
resulting pages. 

-- 
greg



Re: contrib/rtree_gist into core system?

From
falcon
Date:
Teodor Sigaev <teodor@sigaev.ru> writes:

> 1. In your meaning, btree has bad split algorithm too. Look at _bt_compare, if
> first keys on page are unique the the later keys will not be compared ;)

Please look at BUG 1614/1616.
Pleeaaasssseee.
There are also troubles with intarray, may be it can touch tsearch2. I don't know.
But the bug exists.

-- 
falcon                          mailto:falcon@intercable.ru




Re: contrib/rtree_gist into core system?

From
Oleg Bartunov
Date:
On Wed, 29 Jun 2005, falcon wrote:

> Teodor Sigaev <teodor@sigaev.ru> writes:
>
>> 1. In your meaning, btree has bad split algorithm too. Look at _bt_compare, if
>> first keys on page are unique the the later keys will not be compared ;)
>
> Please look at BUG 1614/1616.
> Pleeaaasssseee.
> There are also troubles with intarray, may be it can touch tsearch2. I don't know.
> But the bug exists.

Yura,

could you please refresh our memory what's the bug about ?

>
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: contrib/rtree_gist into core system?

From
Oleg Bartunov
Date:
Yura,

I found your message 
http://archives.postgresql.org/pgsql-bugs/2005-04/msg00213.php

So, what's the problem ? Could you reproduce your problem without
silly plpgsql functions ? Just plain create table, inserts and selects.
Also, have you tried CVS HEAD before crying too much ?

    Oleg
On Wed, 29 Jun 2005, falcon wrote:

> Teodor Sigaev <teodor@sigaev.ru> writes:
>
>> 1. In your meaning, btree has bad split algorithm too. Look at _bt_compare, if
>> first keys on page are unique the the later keys will not be compared ;)
>
> Please look at BUG 1614/1616.
> Pleeaaasssseee.
> There are also troubles with intarray, may be it can touch tsearch2. I don't know.
> But the bug exists.
>
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: contrib/rtree_gist into core system?

From
falcon
Date:
Hello, pgsql-hackers.
Hello, Oleg Bartunov.

Here are my first messages. Bug was found on these real data in a real table.
My hairs raised. Excuse my emotionality.

http://archives.postgresql.org/pgsql-bugs/2005-04/msg00160.php
http://archives.postgresql.org/pgsql-bugs/2005-04/msg00162.php

-- 
С уважением,falcon                          mailto:falcon@intercable.ru




Re: contrib/rtree_gist into core system?

From
Oleg Bartunov
Date:
On Thu, 30 Jun 2005, falcon wrote:

> Hello, pgsql-hackers.
> Hello, Oleg Bartunov.
>
> Here are my first messages. Bug was found on these real data in a real table.
> My hairs raised. Excuse my emotionality.
>
> http://archives.postgresql.org/pgsql-bugs/2005-04/msg00160.php
> http://archives.postgresql.org/pgsql-bugs/2005-04/msg00162.php

What we need is a test suite which demonstrates GiST problem, not 
possible problems in your plpgsql programming. Ideally, we need
sql for creating objects, input data and test selects. 
We're rather busy and if you want somebody spent his time you need to 
spent your time.

>
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: contrib/rtree_gist into core system?

From
falcon
Date:
Hello, pgsql-hackers.

Sorry for fludding. Bug 1614 was fixed in 8.0.3. I just tests.

-- falcon                          mailto:falcon@intercable.ru