Thread: has anybody else used r-tree indexes in 6.5?
has something changed with r-tree indexes in 6.5? i had a database in 6.4.2 that had a box type and would do queries on it with the box overlap operator ("&&"). the field i'm querying is indexed with an r-tree index. all this worked fine in 6.4.2. then i did a dump & restore from 6.4.2 to 6.5 and now the queries that were running ok in 6.4.2 aren't in 6.5. if i drop the index, everything is fine (albeit slow). this is the error i'm getting: ERROR: Operator 500 must have a restriction selectivity estimator to be used in a btree index which doesn't make sense because it's an r-tree index, first of all. and second, i don't know what to look for (especially since this was working great with 6.4.2). i'm running on a linux system, compiled with a pretty recent egcs. the diffs in the regression tests didn't seem to be anything major. the only configuration option i selected was --with-perl when i compiled it, so so everything should be pretty vanilla. (i haven't even added in any of my custom types yet, which have been known to mess with operators in the past.) it's really easy to reproduce the error: create table test_table (area1 box); insert into test_table values ( '(100,100,200,200)'::box ); create index test_table_index on test_table using rtree ( area1 box_ops ); select * from test_table where area1 && '(0,0),(100,100)'::box; drop index test_table_index; select * from test_table where area1 && '(0,0),(100,100)'::box; any ideas? jeff ps, is it just me, or do things seem a little dead around here today?
Jeff Hoffmann <jeff@remapcorp.com> writes: > ERROR: Operator 500 must have a restriction selectivity estimator to be > used in a btree index I put in that error check, so I must be guilty :-(. I didn't think it should affect r-trees either. I'll take a look. regards, tom lane
Jeff Hoffmann <jeff@remapcorp.com> writes: > has something changed with r-tree indexes in 6.5? > ERROR: Operator 500 must have a restriction selectivity estimator to be > used in a btree index What we have here is a big OOOPS. The reference to "btree" is coming from btreesel, which it turns out is called by rtreesel as well --- I missed that the first time around. But that's just a cosmetic problem in the error message. The real problem is that none of the operators that are used for rtree indexes have restriction selectivity estimators. They *used* to have selectivity estimators in 6.4.2 --- they all pointed at intltsel, which is pretty much completely inappropriate for an area- based comparison. I believe I must have removed those links from the pg_operator table during one of my cleanup-and-make-consistent passes. The right fix would be to put in an appropriate selectivity estimator, but we can't do that as a 6.5.* patch because changing pg_operator requires an initdb. It will have to wait for 6.6. (One of my to-do items for 6.6 was to rewrite the selectivity estimators anyway, so I'll see what I can do.) In the meantime, I think the only possible patch is to disable the error check in btreesel and have it return a default selectivity estimate instead of complaining. Drat. Apparently, none of the regression tests exercise rtree indexes at all, else we'd have known there was a problem. Adding an rtree regression test seems to be strongly indicated as well... regards, tom lane
> Jeff Hoffmann <jeff@remapcorp.com> writes: > > has something changed with r-tree indexes in 6.5? > > ERROR: Operator 500 must have a restriction selectivity estimator to be > > used in a btree index > > ... > > The right fix would be to put in an appropriate selectivity estimator, > but we can't do that as a 6.5.* patch because changing pg_operator > requires an initdb. It will have to wait for 6.6. Would it possible to write an sql script to put in the changes? Seems it should be possible to do this for any non-code changes that affect the system tables. Darren
> The right fix would be to put in an appropriate selectivity estimator, > but we can't do that as a 6.5.* patch because changing pg_operator > requires an initdb. It will have to wait for 6.6. (One of my to-do > items for 6.6 was to rewrite the selectivity estimators anyway, so I'll > see what I can do.) In the meantime, I think the only possible patch is > to disable the error check in btreesel and have it return a default > selectivity estimate instead of complaining. Drat. > > Apparently, none of the regression tests exercise rtree indexes at all, > else we'd have known there was a problem. Adding an rtree regression test > seems to be strongly indicated as well... Sounds like a good fix. Bypass the system tables, since we can't change them, and hard-wire a selectivity. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
[Charset iso-8859-1 unsupported, filtering to ASCII...] > > Jeff Hoffmann <jeff@remapcorp.com> writes: > > > has something changed with r-tree indexes in 6.5? > > > ERROR: Operator 500 must have a restriction selectivity estimator to be > > > used in a btree index > > > > ... > > > > The right fix would be to put in an appropriate selectivity estimator, > > but we can't do that as a 6.5.* patch because changing pg_operator > > requires an initdb. It will have to wait for 6.6. > > Would it possible to write an sql script to put in the changes? > > Seems it should be possible to do this for any non-code changes > that affect the system tables. Yes, we could, and have in the past had SQL scripts that are run as part of the upgrade, but this fix is easier to do in C because it doesn't require that difficult step for upgraders. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> has something changed with r-tree indexes in 6.5? > any ideas? revision 1.29 date: 1999/05/31 19:32:47; author: tgl; state: Exp; lines: +61 -5 Generate a more specific error message when an operator used in an index doesn't have a restriction selectivity estimator. Tom, was there anything more here than the new elog error exit itself? It used to ignore the missing estimator, or fail farther in to the code? > ps, is it just me, or do things seem a little dead around here today? At your office, or on the list? Most of us were waiting for your message ;) - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
> What we have here is a big OOOPS. > The right fix would be to put in an appropriate selectivity estimator, > but we can't do that as a 6.5.* patch because changing pg_operator > requires an initdb. It will have to wait for 6.6. (One of my to-do > items for 6.6 was to rewrite the selectivity estimators anyway, so I'll > see what I can do.) Uh, I think we *should* do it as a patch, just not one applied to the cvs tree for the v.6.5.x branch. Let's apply it to the main cvs branch once we do the split, and Jeff can use a snapshot at that time (since it will strongly resemble v6.5 and since he wants the capability). In the meantime, can you/we develop a set of patches for Jeff to use? Once we have them, we can post them into ftp://postgresql.org/pub/patches, which probably needs to be cleaned out from the v6.4.x period. Let me know if I can help with any of this... > In the meantime, I think the only possible patch is > to disable the error check in btreesel and have it return a default > selectivity estimate instead of complaining. Drat. ... and let's use this solution for the v6.5.x branch, once it comes into being. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> In the meantime, I think the only possible patch is >> to disable the error check in btreesel and have it return a default >> selectivity estimate instead of complaining. Drat. > ... and let's use this solution for the v6.5.x branch, once it comes > into being. I've already done that, committed it, and posted the patch on pgsql-patches. We can reverse out the patch after something's been done to provide reasonable selectivity estimates for rtrees. Applying intltsel, as 6.4 did, was so bogus that it's difficult to argue that the resulting numbers were better than the 0.5 default estimate I just put into btreesel ;-) ... so I feel no special desire to return to the status quo ante. I have a to-do list item to look at the whole selectivity estimation business, and I will try to figure out something reasonable for rtrees while I'm at it. It may be a while before that gets to the top of the to-do list (unless someone else gets to it before I do), but I think this patch will do fine until then. Mostly I'm embarrassed that we didn't notice the problem during beta testing :-(. No regression test, and no users of rtrees in the beta population either, it would seem. regards, tom lane
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > date: 1999/05/31 19:32:47; author: tgl; state: Exp; lines: +61 -5 > Generate a more specific error message when an operator used > in an index doesn't have a restriction selectivity estimator. > Tom, was there anything more here than the new elog error exit itself? > It used to ignore the missing estimator, or fail farther in to the > code? That code useta look something like fmgr(get_oprrest(operatorOID), ...) so that if get_oprrest returned 0 you'd get an error message along the lines of "fmgr: no function cache entry for OID 0". This was pretty unhelpful, of course, and someone complained about it a few weeks ago; so I added a test for missing oprrest. That wasn't what broke things ... what broke things was my removal of seemingly bogus oprrest links from pg_operator, which I think I did on 4/10: revision 1.56 date: 1999/04/10 23:53:00; author: tgl; state: Exp; lines: +99 -99 Fix another batch of bogosities in pg_operator table. These were bogus selectivity-estimator links, like a '>' operator pointing to intltsel when it should use intgtsel. regards, tom lane
> > In the meantime, I think the only possible patch is > > to disable the error check in btreesel and have it return a default > > selectivity estimate instead of complaining. Drat. > > ... and let's use this solution for the v6.5.x branch, once it comes > into being. He already did this, Thomas. The 6.5.x branch is currently our only active branch. Any patches now applied appear in 6.5.x and 6.6. We will split it when someone needs to start on 6.6-only features. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > Applying intltsel, as 6.4 did, was so bogus that it's difficult > to argue that the resulting numbers were better than the 0.5 > default estimate I just put into btreesel ;-) ... so I feel no > special desire to return to the status quo ante. I have a to-do > list item to look at the whole selectivity estimation business, > and I will try to figure out something reasonable for rtrees > while I'm at it. It may be a while before that gets to the top > of the to-do list (unless someone else gets to it before I do), > but I think this patch will do fine until then. > > Mostly I'm embarrassed that we didn't notice the problem during > beta testing :-(. No regression test, and no users of rtrees > in the beta population either, it would seem. > No reason to be emarrassed. This 6.5 release is our smoothest yet. Sometimes, we had some pretty major initial problems, and at this stage, we would be figuring out when we needed to get the next subrelease out. We are sitting around at this point, just tweeking things, and have no major need to rush into a minor release to fix problems because we don't have a flood of identical bug reports that have users screaming. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
At 11:11 PM 6/18/99 -0400, Bruce Momjian wrote: >We are sitting around at this point, just tweeking things, and have no >major need to rush into a minor release to fix problems because we don't >have a flood of identical bug reports that have users screaming. Actually, I noticed this also. I don't have experience with your earlier release attempts, but there were enough "wink, wink - we'll be busy chasing initial release bugs" type comments to make me wonder. This release is ... somewhat mature. And you seem to be capturing areas that regression testing doesn't touch. I don't have to tell you that the writing of tests is of crucial importance (though of course entirely inadequate!). I'm really impressed with this release... - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, and other goodies at http://donb.photo.net
I read through some of the papers about R-trees and GIST about a year ago, and it seems that estimating costs for R-tree searches (and GIST searches) is not so straightforward as B-Trees. Hellerstein et al. 1995 write "...currently such estimates are reasonably accurate for B+ trees and less so for R-Trees. Recently, some work on R-tree cost estimation has been done by [FK94], but more work is required to bring this to bear on GISTs in general...." The reference that they give is [FK94] Christos Faloutsos and Ibrahim Kamel. "Beyond Uniformity and Independence: Analysis of R-trees using the concept of fractal dimension. Proc. 13th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp 4--13, Minneapolis, May 1994 I don't have the Faloustos paper. The R-tree code authors, and the GIST authors just used the B-Tree code as an expedient solution. Bernie Frankpitt
On Sat, Jun 19, 1999 at 09:12:11PM +0000, Bernard Frankpitt wrote: > I read through some of the papers about R-trees and GIST about a year > ago, > and it seems that estimating costs for R-tree searches (and GIST > searches) is > not so straightforward as B-Trees. > > Hellerstein et al. 1995 write > "...currently such estimates are reasonably accurate for B+ trees > and less so for R-Trees. Recently, some work on R-tree cost > estimation has been done by [FK94], but more work is required to bring > this to bear on GISTs in general...." > > The reference that they give is > > [FK94] Christos Faloutsos and Ibrahim Kamel. "Beyond Uniformity and > Independence: Analysis of R-trees using the concept of fractal > dimension. > Proc. 13th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database > Systems, pp 4--13, Minneapolis, May 1994 > Hmm, a quick Google search for these two authors hit on a great index server in Germany: http://www.informatik.uni-trier.de/~ley/db/index.html http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/f/Faloutsos:Christos.htm And that paper in particular: http://www.informatik.uni-trier.de/~ley/db/conf/pods/pods94-4.html Which gives an abstract, access to an electronic version (ACS membership required) and a cite for a more recent (1997) journal paper. HTH, Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
Tom Lane wrote: > > Jeff Hoffmann <jeff@remapcorp.com> writes: > > has something changed with r-tree indexes in 6.5? > > ERROR: Operator 500 must have a restriction selectivity estimator to be > > used in a btree index > > What we have here is a big OOOPS. i guess so. the patch works fine, though, so no big deal. i thought it was weird that i hadn't heard it come up before since it didn't seem like something i could have caused, but you never know. > Apparently, none of the regression tests exercise rtree indexes at all, > else we'd have known there was a problem. Adding an rtree regression test > seems to be strongly indicated as well... i noticed this when i ran the regression tests and everything came out ok, but forgot to mention it. if i recall correctly, what's actually in the geometry regression test is pretty weak. i think it only really tests some of the common cases, not all of the functions. it's probably not a high priority item, though, since, judging by how long it took for this bug to surface, there aren't a lot of people using the geometry functions/types.
Jeff Hoffmann <jeff@remapcorp.com> writes: > Tom Lane wrote: >> Apparently, none of the regression tests exercise rtree indexes at all, >> else we'd have known there was a problem. Adding an rtree regression test >> seems to be strongly indicated as well... > i noticed this when i ran the regression tests and everything came out > ok, but forgot to mention it. if i recall correctly, what's actually in > the geometry regression test is pretty weak. i think it only really > tests some of the common cases, not all of the functions. it's probably > not a high priority item, though, since, judging by how long it took for > this bug to surface, there aren't a lot of people using the geometry > functions/types. That's exactly why we need a more thorough regression test. The core developers aren't doing much with the geometry operations, and evidently neither are any of the frontline beta testers. So, if the regression tests don't cover the material either, we stand a good chance of breaking things and not even knowing it --- which is exactly what happened here. It seems that you do make use of the geometry operations; perhaps you would be willing to work up some more-thorough regression tests? You're certainly better qualified to do it than I am... regards, tom lane
Tom Lane wrote: > That's exactly why we need a more thorough regression test. The core > developers aren't doing much with the geometry operations, and evidently > neither are any of the frontline beta testers. So, if the regression > tests don't cover the material either, we stand a good chance of > breaking things and not even knowing it --- which is exactly what > happened here. > > It seems that you do make use of the geometry operations; perhaps > you would be willing to work up some more-thorough regression tests? > You're certainly better qualified to do it than I am... > well, depending on how complete you want the regression tests, this could be fairly easy. after a quick look at the tests, it seems like the only type that is really left out is line (which i don't know if there are any native operators for it anyway, all i know about are the ones for lsegs). just a simple select for the forgotten operators in with the test of other operators for each type would be an improvement. i think all of the functions are covered in at least one place. again, though, i think everything that i use on a regular basis is covered in the regression test. so overall it's really not that bad. except, of course, for the bug i uncovered. there actually is a place where an rtree index is created, but nothing is every selected against it, which is what caused this error to go unnoticed. i haven't looked closely enough at parts of the other regression tests to see if there are any selects where indexes come in to play, but it'd be a good idea to make sure indexes are actually used in the tests for all access methods (and op classes? - i can't really imagine when this would be a problem, but who knows). i'll try updating some of the dedicated tests (box.sql, circle.sql, geometry.sql, lseg.sql, path.sql, polygon.sql), but i'm not sure where testing the rtree indexes should go. i think other index types are tested in select.sql, but i'd probably put them in geometry.sql. does anybody care? is there someone that oversees the methods and organization of the regression tests or do people just throw in new tests when there's something new?
> i'll try updating some of the dedicated tests (box.sql, circle.sql, > geometry.sql, lseg.sql, path.sql, polygon.sql), but i'm not sure where > testing the rtree indexes should go. i think other index types are > tested in select.sql, but i'd probably put them in geometry.sql. does > anybody care? is there someone that oversees the methods and > organization of the regression tests or do people just throw in new > tests when there's something new? Well, Marc and I had reorganized the regression tests a couple of years ago, and most of the organizational changes since then have been done by us too (Marc handling the platform-specific stuff, and I the tests themselves). But new test areas have been added by others, and we certainly could use more contributions to existing tests, reorganizing them if that seems advisable. I agree with your suggestion to put rtree testing in geometry.sql, at least until the size of the new tests would suggest separating it into a new "rtree.sql" test. Go ahead and do something. We'll apply it to the tree, and if there is something which provokes someone else into modifying it, we'll do it then. But I'm sure whatever you do will be fine, since you have clearly already given some thought to it. Thanks... - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Jeff Hoffmann <jeff@remapcorp.com> writes: > i'll try updating some of the dedicated tests (box.sql, circle.sql, > geometry.sql, lseg.sql, path.sql, polygon.sql), but i'm not sure where > testing the rtree indexes should go. i think other index types are > tested in select.sql, but i'd probably put them in geometry.sql. does > anybody care? is there someone that oversees the methods and > organization of the regression tests or do people just throw in new > tests when there's something new? AFAIK we have no regression-test-meister (though we should). Do what seems reasonable. About the only stylistic thing I'd suggest is to try to avoid machine dependent results. For example, the existing geometry.sql test causes a lot of uninteresting comparison failures on many machines because of small variations in roundoff error. So, if you can exercise a feature using only exact-integral inputs and results, do it that way rather than making up "realistic" test data. (A lot of people would be very happy if you could revise this problem away in geometry.sql ... but if that seems like more work than you bargained for, don't worry about it. Extending the test coverage is the high-priority task, I think.) I'd be inclined to agree that rtree indexes should be tested in one of the existing geometry-related tests, or perhaps in a brand new regression test, rather than sticking them into the generic select.sql test. Thanks for taking a shot at it! regards, tom lane