Thread: contrib/rtree_gist into core system?
I've mentioned this a couple times now, but here is an official proposal: I would like to move the contrib/rtree_gist functionality into the core system before feature freeze. There are a couple arguments in favor: * It's way past time GiST indexes got some regression testing in the core build. rtree_gist is the only very plausible candidate for near-term integration: I don't think tsearch2 is ready for that, and the other GiST-using contrib modules are not of sufficient general usefulness. * With the recent WAL-ization and hoped-for concurrency fixes, GiST is definitely superior to R-tree for practical use. I don't see the percentage in continuing to maintain the R-tree code indefinitely. By integrating the opclasses needed to replace R-tree, we can start down the path to deprecating and eventually removing R-tree. Any objections? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > * With the recent WAL-ization and hoped-for concurrency fixes, GiST > is definitely superior to R-tree for practical use. I don't see the > percentage in continuing to maintain the R-tree code indefinitely. > By integrating the opclasses needed to replace R-tree, we can start > down the path to deprecating and eventually removing R-tree. I think we still have a serious problem with multicolumn indexes. As they stand they're basically only indexes on the first column. The later columns are not used to determine page splits. Also, isn't rtree still substantially faster than gist? Like I think rtree is on the order of 2x as fast as rtree_gist. Perhaps the concurrency fixes will improve this. -- greg
Greg Stark <gsstark@mit.edu> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> * With the recent WAL-ization and hoped-for concurrency fixes, GiST >> is definitely superior to R-tree for practical use. I don't see the >> percentage in continuing to maintain the R-tree code indefinitely. >> By integrating the opclasses needed to replace R-tree, we can start >> down the path to deprecating and eventually removing R-tree. > I think we still have a serious problem with multicolumn indexes. As they > stand they're basically only indexes on the first column. The later columns > are not used to determine page splits. R-tree doesn't do multicolumn at all, so this is is hardly an argument for keeping it, is it? > Also, isn't rtree still substantially faster than gist? Not according to contrib/rtree_gist/bench/, though I admit I have not bothered to reproduce the experiment. regards, tom lane
>>Also, isn't rtree still substantially faster than gist? > > Not according to contrib/rtree_gist/bench/, though I admit I have not > bothered to reproduce the experiment. Will you just remove rtree and make rtree indexes use rtree_gist instead? Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > Will you just remove rtree and make rtree indexes use rtree_gist instead? That is in the back of my mind for 8.2 or so; I thought it'd be a bit premature to propose it for 8.1, though. The real bottom line here is that any devel effort we might put into rtree would probably be better spent on gist. We have limited resources and we can't afford to waste 'em on code that doesn't have a clear claim to fame. As a contrasting example, I am surely not proposing that we drop btree in favor of btree_gist ;-) ... but rtree has always been marginal, and it's very hard to see where it can win over gist. regards, tom lane
Tom Lane Wrote: > ... but rtree has always > been marginal, and it's very hard to see where it can win over gist. Simplicity! Implementing rtree operators and support functions is FAR simpler than implementing the GiST equivalents. For example, suppose all you want to implement is the ~ operator for a custom type, then technically all you need is 4 functions (well, 5 including the stub operators) bool contains(type,type); type intersect(type,type); type union(type,type); void size(type,*float); And the 6 other operators simply defined as: bool false(type) { return false; } For GiST you still need 7 support functions + the operator function, some of which aren't exactly simple to implement, the picksplit for instance. So I'd not recommend getting rid of rtree just yet. At least not until someone has written an extensive howto on the subject of GiST implementation. ... John
"John Hansen" <john@geeknet.com.au> writes: > Tom Lane Wrote: >> ... but rtree has always >> been marginal, and it's very hard to see where it can win over gist. > Simplicity! > Implementing rtree operators and support functions is FAR simpler than > implementing the GiST equivalents. Mmm ... not really. It does seem that we could offer some sort of generic set of gist support routines that understand rtree-like semantics (all the picksplit routines seem suspiciously similar, for instance). But seeing that every known instance of rtree support has been broken since day one, partly for lack of any documentation about what was needed, it's a bit hard to claim that implementing rtree is transparently trivial. > So I'd not recommend getting rid of rtree just yet. At least not until > someone has written an extensive howto on the subject of GiST > implementation. There's no HOWTO for rtree either. Again, my point is not that one couldn't be written; it's that we would probably be better off spending the effort on a HOWTO for gist. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > "John Hansen" <john@geeknet.com.au> writes: > > > Simplicity! > > Implementing rtree operators and support functions is FAR simpler than > > implementing the GiST equivalents. > > Mmm ... not really. It does seem that we could offer some sort of > generic set of gist support routines that understand rtree-like > semantics (all the picksplit routines seem suspiciously similar, > for instance). I think the picksplit part of the API is the strangest part. Normally when you design data types you think in terms of your data type and the operations on it. You shouldn't suddenly have to immerse yourself in some nitty gritty ADTs for index pages. I believe all the picksplit functions are based on (apparently via copy/paste) a single algorithm that depends on a single operator: a kind of "distance" function. Usually it's the same function underlying the penalty gist api function. That's a natural operator to implement for data types and one that doesn't involve learning about any extraneous implementation details of gist indexing. If gist index operator classes could be activated based entirely on data type operators like "distance" and "union" and "penalty" instead of this strange "picksplit" function then it would make implementing them *much* easier. Moreover it would solve the multicolumn index issue. There are a probably other options but the simplest would be to simply take the inner product of the results of the various distance functions. -- greg
On 2005-06-27, Greg Stark <gsstark@mit.edu> wrote: > I believe all the picksplit functions are based on (apparently via > copy/paste) a single algorithm that depends on a single operator: a kind > of "distance" function. Usually it's the same function underlying the > penalty gist api function. That's not quite true. There are at least two quite different picksplit algorithms in those of the contrib/* modules that I've studied, and in general I do not think it is possible to provide a single generic picksplit that will work efficiently for _all_ data types. (And it is of course important not to constrain the types of data that are allowed...) It might be reasonable to implement a "default" picksplit based on a user-supplied metric function (_not_ the same metric as "penalty"). But I think there always needs to be scope for the user to provide their own split function. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
We can make r-tree as contrib module and then we will have example of index in contrib... > By integrating the opclasses needed to replace R-tree, we can start > down the path to deprecating and eventually removing R-tree. > -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> I think we still have a serious problem with multicolumn indexes. As they > stand they're basically only indexes on the first column. The later columns > are not used to determine page splits. It's not a fully truth, second keys can be used in split, if first columns has non-unique values and second, the later columns uses in gistchoose method (wrap for user-defined penalty methods). But I am agreed, that split in multicolumn GiST indexes isn't very optimal, the solution was suggested by Aoki, but it's require to change interface to user function. Look: "Generalizing ''Search'' in Generalized Search Trees", 1997, Paul M. Aoki,http://www.sai.msu.su/~megera/postgres/gist/papers/csd-97-950.pdf -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> I believe all the picksplit functions are based on (apparently via copy/paste) > a single algorithm that depends on a single operator: a kind of "distance" > function. Usually it's the same function underlying the penalty gist api You are wrong, at least now in contrib it used three basic picksplit algoritm 1 simple sorting for ordered domain( btree_gist, ltree ) 2 several variations of Guttmans algorithm (tsearch2, intarray, seg, cube) 3 linear picksplit for rtree_gist (http://www.sai.msu.su/~megera/postgres/gist/papers/nsplitLN.ps.gz). -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
FYI, compress and decompress methods may be trivial. > > For GiST you still need 7 support functions + the operator function, > some of which aren't exactly simple to implement, the picksplit for > instance. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Tom Lane [mailto:tgl@sss.pgh.pa.us] Wrote: > There's no HOWTO for rtree either. Again, my point is not > that one couldn't be written; it's that we would probably be > better off spending the effort on a HOWTO for gist. No, but the _current_ implementation of the rtree operators are ver much self explaining and need no howto. Union(x,y) = x + y Intersect(x,y) = the values that are present in both x and y, or _overlapping_region_ Size(x) = the size of the area/length of the line, number of elements, etc... Now, how simple is that compared to gist? I for one, is yet to produce a working example of something as simple as indexing an array of 2 elements [x y] represented by a custom type as '[x y]' in string format (returned by type_out) internally stored as a char[2], so that I can fetch all rows where [x y] = ':y' (:y meaning 2nd element in array, x: meaning first element in array. I chose this as something simple to play with, having no practical application for me, but to get an understanding of gist,.... For now,. I have put it in the too hard basket. I did however in about half a day implement rtree support for inet/cidr (ipv4 only) as you might recall. Kind Regards, John
"John Hansen" <john@geeknet.com.au> writes: > No, but the _current_ implementation of the rtree operators are ver much > self explaining and need no howto. That reasoning no doubt explains why we don't have *any* rtree-like opclasses that got the left/overleft/right/overright semantics right the first time :-(. Greg Stark is right that the GIST API could probably be simpler --- in particular it would be interesting to see if we could offer a default picksplit function that most opclasses could use. But that doesn't mean that the rtree API is exactly trivial. regards, tom lane
> I think we still have a serious problem with multicolumn indexes. As they > stand they're basically only indexes on the first column. The later columns > are not used to determine page splits. 1. In your meaning, btree has bad split algorithm too. Look at _bt_compare, if first keys on page are unique the the later keys will not be compared ;) 2. About rtree interface: it's possible to write GiST-RTree layer compatibility interface. User's interface will just copied from RTree, and layer will translate it to GiST interface. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Teodor Sigaev <teodor@sigaev.ru> writes: > 1. In your meaning, btree has bad split algorithm too. Look at _bt_compare, if > first keys on page are unique the the later keys will not be compared ;) I'm confused now. I was under the impression that gist didn't look at subsequent columns if the first column was identical. Maybe I got it backwards and it's insert that fails to look at subsequent columns but page split handles it ok? So if you insert lots of tuples with identical first columns the page will be split intelligently but then subsequent inserts will be distributed essentially randomly between the two resulting pages. -- greg
Teodor Sigaev <teodor@sigaev.ru> writes: > 1. In your meaning, btree has bad split algorithm too. Look at _bt_compare, if > first keys on page are unique the the later keys will not be compared ;) Please look at BUG 1614/1616. Pleeaaasssseee. There are also troubles with intarray, may be it can touch tsearch2. I don't know. But the bug exists. -- falcon mailto:falcon@intercable.ru
On Wed, 29 Jun 2005, falcon wrote: > Teodor Sigaev <teodor@sigaev.ru> writes: > >> 1. In your meaning, btree has bad split algorithm too. Look at _bt_compare, if >> first keys on page are unique the the later keys will not be compared ;) > > Please look at BUG 1614/1616. > Pleeaaasssseee. > There are also troubles with intarray, may be it can touch tsearch2. I don't know. > But the bug exists. Yura, could you please refresh our memory what's the bug about ? > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Yura, I found your message http://archives.postgresql.org/pgsql-bugs/2005-04/msg00213.php So, what's the problem ? Could you reproduce your problem without silly plpgsql functions ? Just plain create table, inserts and selects. Also, have you tried CVS HEAD before crying too much ? Oleg On Wed, 29 Jun 2005, falcon wrote: > Teodor Sigaev <teodor@sigaev.ru> writes: > >> 1. In your meaning, btree has bad split algorithm too. Look at _bt_compare, if >> first keys on page are unique the the later keys will not be compared ;) > > Please look at BUG 1614/1616. > Pleeaaasssseee. > There are also troubles with intarray, may be it can touch tsearch2. I don't know. > But the bug exists. > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Hello, pgsql-hackers. Hello, Oleg Bartunov. Here are my first messages. Bug was found on these real data in a real table. My hairs raised. Excuse my emotionality. http://archives.postgresql.org/pgsql-bugs/2005-04/msg00160.php http://archives.postgresql.org/pgsql-bugs/2005-04/msg00162.php -- С уважением,falcon mailto:falcon@intercable.ru
On Thu, 30 Jun 2005, falcon wrote: > Hello, pgsql-hackers. > Hello, Oleg Bartunov. > > Here are my first messages. Bug was found on these real data in a real table. > My hairs raised. Excuse my emotionality. > > http://archives.postgresql.org/pgsql-bugs/2005-04/msg00160.php > http://archives.postgresql.org/pgsql-bugs/2005-04/msg00162.php What we need is a test suite which demonstrates GiST problem, not possible problems in your plpgsql programming. Ideally, we need sql for creating objects, input data and test selects. We're rather busy and if you want somebody spent his time you need to spent your time. > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Hello, pgsql-hackers. Sorry for fludding. Bug 1614 was fixed in 8.0.3. I just tests. -- falcon mailto:falcon@intercable.ru