Thread: RE: Indexing for geographic objects?
Hi,
Remember also that the GiST library has been integrated into PG, (my brother is doing some thesis workon that at the moment), and you can create new index types relatively quickly (assuming that you understand the indexing theory ;-) using this mechanism. Run a web search on GiST for more info.
Currently GiST has support for btree and rtree indexes, and possibly r+ or * trees, I can't remember which, if any, and IIRC, at least a couple more. However, if you have a requirement or 3d indexing, and you have the knowledge available, you should be able to hack a few 3d indexes together quite quickly.
Cheers...
-----Original Message-----
From: Tom Lane
To: Franck Martin
Cc: pgsql-general; pgsql-hackers
Sent: 11-26-00 4:35 AM
Subject: Re: [HACKERS] Indexing for geographic objects?
Franck Martin <franck@sopac.org> writes:
> I would greatly appreciate if someone could guide me through the
> methodology to build an index for a custom type or point me to some
> readings where the algorithm is explained (web, book, etc...).
The Programmer's Guide chapter "Interfacing Extensions To Indices"
outlines the procedure for making a new datatype indexable. It
only discusses the case of adding btree support for a new type,
though. For other index classes such as R-tree there are different
sets of required operators, which are not as well documented but
you can find out by looking at code for the already-supported
datatypes.
> I plan to use 3D geographical objects...
That's a bit hard since we don't have any indexes suitable for 3-D
coordinates --- the existing R-tree type is for 2-D objects. What's
more it assumes that coordinates are Euclidean, which is probably
not the model you want for geographical coordinates.
In theory you could build a new index type suitable for indexing
3-D points, using the R-tree code as a starting point. I wouldn't
class it as a project suitable for a newbie however :-(.
Depending on what your needs are, you might be able to get by with
projecting your objects into a flat 2-D coordinate system and using
an R-tree index in that space. It'd just be approximate but that
might be close enough for index purposes.
regards, tom lane
**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.
www.mimesweeper.com
**********************************************************************
I'm also interested in GiST and would be happy if somebody could provide workable example. I have an idea to use GiST indices for our fulltextsearch system. Regards, Oleg On Sun, 26 Nov 2000, Michael Ansley wrote: > Date: Sun, 26 Nov 2000 11:34:16 -0000 > From: Michael Ansley <Michael.Ansley@intec-telecom-systems.com> > To: 'Tom Lane ' <tgl@sss.pgh.pa.us>, 'Franck Martin ' <franck@sopac.org> > Cc: 'pgsql-general ' <pgsql-general@postgresql.org>, > 'pgsql-hackers ' <pgsql-hackers@postgresql.org>, > "'t.h.p.ansley@durham.co.uk'" <t.h.p.ansley@durham.co.uk> > Subject: RE: [HACKERS] Indexing for geographic objects? > > Hi, > > Remember also that the GiST library has been integrated into PG, (my brother > is doing some thesis workon that at the moment), and you can create new > index types relatively quickly (assuming that you understand the indexing > theory ;-) using this mechanism. Run a web search on GiST for more info. > > Currently GiST has support for btree and rtree indexes, and possibly r+ or * > trees, I can't remember which, if any, and IIRC, at least a couple more. > However, if you have a requirement or 3d indexing, and you have the > knowledge available, you should be able to hack a few 3d indexes together > quite quickly. > > > Cheers... > > > -----Original Message----- > From: Tom Lane > To: Franck Martin > Cc: pgsql-general; pgsql-hackers > Sent: 11-26-00 4:35 AM > Subject: Re: [HACKERS] Indexing for geographic objects? > > Franck Martin <franck@sopac.org> writes: > > I would greatly appreciate if someone could guide me through the > > methodology to build an index for a custom type or point me to some > > readings where the algorithm is explained (web, book, etc...). > > The Programmer's Guide chapter "Interfacing Extensions To Indices" > outlines the procedure for making a new datatype indexable. It > only discusses the case of adding btree support for a new type, > though. For other index classes such as R-tree there are different > sets of required operators, which are not as well documented but > you can find out by looking at code for the already-supported > datatypes. > > > I plan to use 3D geographical objects... > > That's a bit hard since we don't have any indexes suitable for 3-D > coordinates --- the existing R-tree type is for 2-D objects. What's > more it assumes that coordinates are Euclidean, which is probably > not the model you want for geographical coordinates. > > In theory you could build a new index type suitable for indexing > 3-D points, using the R-tree code as a starting point. I wouldn't > class it as a project suitable for a newbie however :-(. > > Depending on what your needs are, you might be able to get by with > projecting your objects into a flat 2-D coordinate system and using > an R-tree index in that space. It'd just be approximate but that > might be close enough for index purposes. > > regards, tom lane > > > ********************************************************************** > This email and any files transmitted with it are confidential and > intended solely for the use of the individual or entity to whom they > are addressed. If you have received this email in error please notify > the system manager. > > This footnote also confirms that this email message has been swept by > MIMEsweeper for the presence of computer viruses. > > www.mimesweeper.com > ********************************************************************** > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov <oleg@sai.msu.su> wrote: > > I'm also interested in GiST and would be happy if somebody could provide > workable example. I have an idea to use GiST indices for our fulltextsearch > system. > I have recently replied to Franck Martin in regards to this indexing question, but I didn't think the subject was popular enough for me to contaminate the list(s). You prove me wrong. Here goes: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To: Franck Martin <franck@sopac.org> From: selkovjr@mcs.anl.gov Reply-to: selkovjr@mcs.anl.gov Subject: Re: [HACKERS] Indexing for geographic objects? In-reply-to: <3A1EE0F4.3DC4161B@sopac.org> Comments: In-reply-to Franck Martin <franck@sopac.org> message dated "Sat, 25 Nov 2000 10:43:16 +1300." Mime-Version: 1.0 (generated by tm-edit 7.108) Date: Sat, 25 Nov 2000 02:56:03 -0600 It is probably possible to hook up an extension directly with the R-tree methods available in postgres -- if you stare at the code long enough and figure how to use the correct strategies. I chose an easier path years ago and I am still satisfied with the results. Check out the GiST -- a general access method built on top of R-tree to provide a user-friendly interface to it and to allow indexing of more abstract types, for which straight R-tree is not directly applicable. I have a small set of complete data types, of which a couple illustrate the use of GiST indexing with the geometrical objects, in: http://wit.mcs.anl.gov/~selkovjr/pg_extensions/ If you are using a pre-7.0 postrgres, grab the file contrib.tgz, otherwise take contrib-7.0.tgz. The difference is insignificant, but the pre-7.0 version will not fit the current schema. Unpack the source into postgresql-*/contrib and follow instructions in the README files. The types of interest for you will be seg and cube. You will find pointers to the original sources and docs in the CREDITS section of the README file. I also have a version of the original example code in pggist-patched.tgz, but I did not check if it works with current postgres. It should not be difficult to fix it if it doesn't -- the recent development in the optimizer area made certain things unnecessary. You might want to check out a working example of the segment data type at: http://wit.mcs.anl.gov/EMP/indexing.html (search the page for 'KM') I will be glad to help, but I would also recommend to send more sophisticated questions to Joe Hellerstein, the leader of the original postgres team that developed GiST. He was very helpful whenever I turned to him during the early stages of my data type project. --Gene
It seems that your code is exactly what I want. I have already created geographical objects which contains MBR(Minimum Bounding Rectangle) in their structure, so it is a question of rewriting your code to change the access to the cube structure to the MBR structure inside my geoobject. (cf http://fmaps.sourceforge.net/) Look in the CVS for latest. I have been slack lately on the project, but I'm not forgetting it. Quickly I ran through the code, and I think your cube is strictly speaking a box, which also a MBR. However I didn't see the case of intersection, which is the main question when you want to display object that are visible inside a box. I suppose your code is under GPL, and you have no problem for me to use it, providing I put your name and credits somewhere. Cheers. Franck Martin Database Development Officer SOPAC South Pacific Applied Geoscience Commission Fiji E-mail: franck@sopac.org Web site: http://www.sopac.org/ This e-mail is intended for its recipients only. Do not forward this e-mail without approval. The views expressed in this e-mail may not be necessarily the views of SOPAC. -----Original Message----- From: selkovjr@mcs.anl.gov [mailto:selkovjr@mcs.anl.gov] Sent: Saturday, 25 November 2000 8:56 To: Franck Martin Subject: Re: [HACKERS] Indexing for geographic objects? It is probably possible to hook up an extension directly with the R-tree methods available in postgres -- if you stare at the code long enough and figure how to use the correct strategies. I chose an easier path years ago and I am still satisfied with the results. Check out the GiST -- a general access method built on top of R-tree to provide a user-friendly interface to it and to allow indexing of more abstract types, for which straight R-tree is not directly applicable. I have a small set of complete data types, of which a couple illustrate the use of GiST indexing with the geometrical objects, in: http://wit.mcs.anl.gov/~selkovjr/pg_extensions/ If you are using a pre-7.0 postrgres, grab the file contrib.tgz, otherwise take contrib-7.0.tgz. The difference is insignificant, but the pre-7.0 version will not fit the current schema. Unpack the source into postgresql-*/contrib and follow instructions in the README files. The types of interest for you will be seg and cube. You will find pointers to the original sources and docs in the CREDITS section of the README file. I also have a version of the original example code in pggist-patched.tgz, but I did not check if it works with current postgres. It should not be difficult to fix it if it doesn't -- the recent development in the optimizer area made certain things unnecessary. You might want to check out a working example of the segment data type at: http://wit.mcs.anl.gov/EMP/indexing.html (search the page for 'KM') I will be glad to help, but I would also recommend to send more sophisticated questions to Joe Hellerstein, the leader of the original postgres team that developed GiST. He was very helpful whenever I turned to him during the early stages of my data type project. --Gene
Michael Ansley <Michael.Ansley@intec-telecom-systems.com> writes: > Remember also that the GiST library has been integrated into PG, (my brother > is doing some thesis workon that at the moment), Yeah? Does it still work? Since the GIST code is not tested by any standard regress test, and is so poorly documented that hardly anyone can be using it, I've always assumed that it is probably suffering from a severe case of bit-rot. I'd love to see someone contribute documentation and regression test cases for it --- it's a great feature, if it works. regards, tom lane
Franck Martin wrote: > > It seems that your code is exactly what I want. > > I have already created geographical objects which contains MBR(Minimum > Bounding Rectangle) in their structure, so it is a question of rewriting > your code to change the access to the cube structure to the MBR structure > inside my geoobject. (cf http://fmaps.sourceforge.net/) Look in the CVS for > latest. I have been slack lately on the project, but I'm not forgetting it. > > Quickly I ran through the code, and I think your cube is strictly speaking a > box, which also a MBR. > > However I didn't see the case of intersection, which is the main question > when you want to display object that are visible inside a box. > > I suppose your code is under GPL, and you have no problem for me to use it, > providing I put your name and credits somewhere. It would be much better if it were under the standard PostgreSQL license and if it is included in the standard distribution. As Tom said, working Gist would be a great feature. Now if only someone would write the regression tests ;) BTW, the regression tests for pl/pgsql seem to be somewhat sparse as well, missing at least some types of loops, possibly more. > Franck Martin > Database Development Officer > SOPAC South Pacific Applied Geoscience Commission > Fiji > E-mail: franck@sopac.org > Web site: http://www.sopac.org/ > > This e-mail is intended for its recipients only. Do not forward this e-mail > without approval. The views expressed in this e-mail may not be necessarily > the views of SOPAC. > > -----Original Message----- > From: selkovjr@mcs.anl.gov [mailto:selkovjr@mcs.anl.gov] > Sent: Saturday, 25 November 2000 8:56 > To: Franck Martin > Subject: Re: [HACKERS] Indexing for geographic objects? > > It is probably possible to hook up an extension directly with the > R-tree methods available in postgres -- if you stare at the code long > enough and figure how to use the correct strategies. I chose an easier > path years ago and I am still satisfied with the results. Check out > the GiST -- a general access method built on top of R-tree to provide > a user-friendly interface to it and to allow indexing of more abstract > types, for which straight R-tree is not directly applicable. > > I have a small set of complete data types, of which a couple > illustrate the use of GiST indexing with the geometrical objects, in: > > http://wit.mcs.anl.gov/~selkovjr/pg_extensions/ > > If you are using a pre-7.0 postrgres, grab the file contrib.tgz, > otherwise take contrib-7.0.tgz. The difference is insignificant, but > the pre-7.0 version will not fit the current schema. Unpack the source > into postgresql-*/contrib and follow instructions in the README > files. The types of interest for you will be seg and cube. You will > find pointers to the original sources and docs in the CREDITS section > of the README file. I also have a version of the original example code > in pggist-patched.tgz, but I did not check if it works with current > postgres. It should not be difficult to fix it if it doesn't -- the > recent development in the optimizer area made certain things > unnecessary. > > You might want to check out a working example of the segment data type at: > > http://wit.mcs.anl.gov/EMP/indexing.html > > (search the page for 'KM') > > I will be glad to help, but I would also recommend to send more > sophisticated questions to Joe Hellerstein, the leader of the original > postgres team that developed GiST. He was very helpful whenever I > turned to him during the early stages of my data type project. > > --Gene
To be honest, Tom, I've always seen GiST not just as a great feature, but as an essential feature. Using Stonebraker's definition of an object-relational database (which I tend to do, as it's the only one that I've read about in depth), we really need to be able to properly index complex data, and using GiST, we can. Besides, it's just plain useful ;-)
MikeA
-----Original Message-----
From: Tom Lane
To: Michael Ansley
Cc: 'Franck Martin '; 'pgsql-general '; 'pgsql-hackers '; 't.h.p.ansley@durham.co.uk'
Sent: 11-27-00 3:32 AM
Subject: Re: [HACKERS] Indexing for geographic objects?
Michael Ansley <Michael.Ansley@intec-telecom-systems.com> writes:
> Remember also that the GiST library has been integrated into PG, (my
brother
> is doing some thesis workon that at the moment),
Yeah? Does it still work?
Since the GIST code is not tested by any standard regress test, and is
so poorly documented that hardly anyone can be using it, I've always
assumed that it is probably suffering from a severe case of bit-rot.
I'd love to see someone contribute documentation and regression test
cases for it --- it's a great feature, if it works.
regards, tom lane
**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.
www.mimesweeper.com
**********************************************************************
Tom Lane wrote: > Michael Ansley <Michael.Ansley@intec-telecom-systems.com> writes: > > Remember also that the GiST library has been integrated into PG, (my brother > > is doing some thesis workon that at the moment), > > Yeah? Does it still work? You bet. One would otherwise be hearing from me. I depend on it quite heavily and I am checking with almost every release. I am now current with 7.0.2 -- this time it required some change, although not in the c code. And that's pretty amazing: I was only screwed once since postgres95 -- by a beta version I don't remember now; then I complained and the problem was fixed. I don't even know whom I owe thanks for that. > Since the GIST code is not tested by any standard regress test, and is > so poorly documented that hardly anyone can be using it, I've always > assumed that it is probably suffering from a severe case of bit-rot. > > I'd love to see someone contribute documentation and regression test > cases for it --- it's a great feature, if it works. The bit rot fortunately did not happen, but the documentation I promised Bruce many months ago is still "in the works" -- meaning, something interfered and I haven't had a chance to start. Like a friend of mine muses all the time, "Promise doesn't mean marriage". Boy, do I feel guilty. It's a bit better with the testing. I am not sure how to test the GiST directly, but I have adapted the current version of regression tests for the data types that depend on it. One can find them in my contrib directory, under test/ (again, it's http://wit.mcs.anl.gov/~selkovjr/pg_extensions/contrib.tgz) It would be nice if at least one of the GiST types became a built-in (that would provide for a more intensive testing), but I can also think of the contrib code being (optionally) included into the main build and regression test trees. The top-level makefile can have a couple of special targets to build and test the contribs. I believe my version of the tests can be a useful example to other contributors whose code is already in the source tree. --Gene
selkovjr@mcs.anl.gov writes: > Tom Lane wrote: >> Michael Ansley <Michael.Ansley@intec-telecom-systems.com> writes: >>>> Remember also that the GiST library has been integrated into PG, (my brother >>>> is doing some thesis workon that at the moment), >> >> Yeah? Does it still work? > You bet. One would otherwise be hearing from me. I depend on it quite > heavily and I am checking with almost every release. That's very good to hear! I was not aware that anyone was banging on it. It seems like it would be a fine idea to adopt your stuff at least into the contrib part of the distribution, maybe even (or eventually) into the main release. I think we could probably make it part of the regress tests even if it's contrib --- there's precedent, as regress already uses some contrib stuff. Do you have any problem with releasing your stuff under the Postgres distribution terms (BSD license)? regards, tom lane
Franck Martin wrote: > I have already created geographical objects which contains MBR(Minimum > Bounding Rectangle) in their structure, so it is a question of rewriting > your code to change the access to the cube structure to the MBR structure > inside my geoobject. (cf http://fmaps.sourceforge.net/) Look in the CVS for > latest. I have been slack lately on the project, but I'm not forgetting it. I see where you are aiming. I definitely want to be around when it starts working. > Quickly I ran through the code, and I think your cube is strictly speaking a > box, which also a MBR. Yes, cube is definitely a misnomer -- it suggests things are equihedral, which they aren't. I am still looking for a short name or an acronym that would indicate it is a box with an arbitrary number of dimensions. With your application, you will surely benefit from a smaller and faster code geared specifically for 3D. > However I didn't see the case of intersection, which is the main question > when you want to display object that are visible inside a box. The procedure is there, it is called cube_inter, but there is no operator for it. > I suppose your code is under GPL, and you have no problem for me to use it, > providing I put your name and credits somewhere. No problem at all -- I will be honored if you use it. Was I careless enough not to include a license? It's not exactly a GPL -- it's completely unrestricted. I should have said that somewhere. Good luck, --Gene
Tom Lane wrote: > Do you have any problem with releasing your stuff under the Postgres > distribution terms (BSD license)? No, I don't see any problem with the BSD license, or any other license, for that matter. I just had some reservations about releasing stuff that was far from perfect, and it took me a while to realize it could be useful as it is for some, and serve as a good starting point for others. Now I wonder, what does it take to be in contrib? > there's precedent, as regress already > uses some contrib stuff. I'd be curious to find out what that stuff is and how it's done. --Gene
On Mon, Nov 27, 2000 at 06:03:33PM -0600, selkovjr@mcs.anl.gov wrote: > Franck Martin wrote: > > I suppose your code is under GPL, and you have no problem for me to > > use it, providing I put your name and credits somewhere. > > No problem at all -- I will be honored if you use it. Was I careless > enough not to include a license? It's not exactly a GPL -- it's > completely unrestricted. I should have said that somewhere. Note that (AIUI) placing code in the public domain leaves you liable for damages from somebody misusing it. You have to retain copyright just to be able to disclaim liability, in the license -- but then you need to actually have a license. That's why you don't see much public domain software. (I am not a lawyer.) Nathan Myers ncm@zembu.com
selkovjr@mcs.anl.gov writes: > Tom Lane wrote: >> Do you have any problem with releasing your stuff under the Postgres >> distribution terms (BSD license)? > No, I don't see any problem with the BSD license, or any other > license, for that matter. I just had some reservations about releasing > stuff that was far from perfect, and it took me a while to realize > it could be useful as it is for some, and serve as a good starting > point for others. Now I wonder, what does it take to be in contrib? Just contributing it ;-), which I take the above as permission to do. When I come up for air from the IPC-hacking I'm doing, I'll grab your tarball and see about adding it as a contrib module. Many thanks! regards, tom lane
Hi, We've done some work with GiST indices and found a little problem with optimizer. The problem could be reproduced with Gene's code (link is in original message below). test data and sql I could send - it's just 52Kb gzipped file. What is a reason for optimizer to decide that sequential scan is better (look below for a numbers). Implicite disabling of seq scan gave much better timings. Regards, Oleg test=# explain select * from test where s @ '1.05 .. 3.95'; NOTICE: QUERY PLAN: Seq Scan on test (cost=0.00..184.01 rows=5000 width=12) EXPLAIN % ./bench.pl -d test -b 100 total: 3.19 sec; number: 100; for one: 0.032 sec; found 18 docs test=# set enable_seqscan = off; SET VARIABLE test=# explain select * from test where s @ '1.05 .. 3.95'; NOTICE: QUERY PLAN: Index Scan using test_seg_ix on test (cost=0.00..369.42 rows=5000 width=12) EXPLAIN % ./bench.pl -d test -b 100 -i total: 1.71 sec; number: 100; for one: 0.017 sec; found 18 docs On Mon, 27 Nov 2000 selkovjr@mcs.anl.gov wrote: > Date: Mon, 27 Nov 2000 12:36:42 -0600 > From: selkovjr@mcs.anl.gov > To: Tom Lane <tgl@sss.pgh.pa.us> > Cc: 'pgsql-general ' <pgsql-general@postgresql.org>, > 'pgsql-hackers ' <pgsql-hackers@postgresql.org> > Subject: Re: [HACKERS] Indexing for geographic objects? > > Tom Lane wrote: > > Michael Ansley <Michael.Ansley@intec-telecom-systems.com> writes: > > > Remember also that the GiST library has been integrated into PG, (my brother > > > is doing some thesis workon that at the moment), > > > > Yeah? Does it still work? > > You bet. One would otherwise be hearing from me. I depend on it quite > heavily and I am checking with almost every release. I am now current > with 7.0.2 -- this time it required some change, although not in the c > code. And that's pretty amazing: I was only screwed once since > postgres95 -- by a beta version I don't remember now; then I > complained and the problem was fixed. I don't even know whom I owe > thanks for that. > > > Since the GIST code is not tested by any standard regress test, and is > > so poorly documented that hardly anyone can be using it, > I've always > > assumed that it is probably suffering from a severe case of bit-rot. > > > > I'd love to see someone contribute documentation and regression test > > cases for it --- it's a great feature, if it works. > > The bit rot fortunately did not happen, but the documentation I > promised Bruce many months ago is still "in the works" -- meaning, > something interfered and I haven't had a chance to start. Like a > friend of mine muses all the time, "Promise doesn't mean > marriage". Boy, do I feel guilty. > > It's a bit better with the testing. I am not sure how to test the > GiST directly, but I have adapted the current version of regression > tests for the data types that depend on it. One can find them in my > contrib directory, under test/ (again, it's > http://wit.mcs.anl.gov/~selkovjr/pg_extensions/contrib.tgz) > > It would be nice if at least one of the GiST types became a built-in > (that would provide for a more intensive testing), but I can also > think of the contrib code being (optionally) included into the main > build and regression test trees. The top-level makefile can have a > couple of special targets to build and test the contribs. I believe my > version of the tests can be a useful example to other contributors > whose code is already in the source tree. > > --Gene > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: > We've done some work with GiST indices and found a little problem > with optimizer. > test=# set enable_seqscan = off; > SET VARIABLE > test=# explain select * from test where s @ '1.05 .. 3.95'; > NOTICE: QUERY PLAN: > Index Scan using test_seg_ix on test (cost=0.00..369.42 rows=5000 width=12) > EXPLAIN > % ./bench.pl -d test -b 100 -i > total: 1.71 sec; number: 100; for one: 0.017 sec; found 18 docs I'd venture that the major problem here is bogus estimated selectivities for rtree/gist operators. Note the discrepancy between the estimated row count and the actual (I assume the "found 18 docs" is the true number of rows output by the query). With an estimated row count even half that (ie, merely two orders of magnitude away from reality ;-)) the thing would've correctly chosen the index scan over sequential. 5000 looks like a suspiciously round number ... how many rows are in the table? Have you done a vacuum analyze on it? regards, tom lane
just a note here ... recently, we had a client with similar problems with using index scan, where turning off seqscan did the trick ... we took his tables, loaded them into a v7.1beta1 server and it correctly comes up with the index scan ... Oleg, have you tried this with v7.1 yet? On Fri, 8 Dec 2000, Tom Lane wrote: > Oleg Bartunov <oleg@sai.msu.su> writes: > > We've done some work with GiST indices and found a little problem > > with optimizer. > > > test=# set enable_seqscan = off; > > SET VARIABLE > > test=# explain select * from test where s @ '1.05 .. 3.95'; > > NOTICE: QUERY PLAN: > > > Index Scan using test_seg_ix on test (cost=0.00..369.42 rows=5000 width=12) > > > EXPLAIN > > % ./bench.pl -d test -b 100 -i > > total: 1.71 sec; number: 100; for one: 0.017 sec; found 18 docs > > I'd venture that the major problem here is bogus estimated selectivities > for rtree/gist operators. Note the discrepancy between the estimated > row count and the actual (I assume the "found 18 docs" is the true > number of rows output by the query). With an estimated row count even > half that (ie, merely two orders of magnitude away from reality ;-)) > the thing would've correctly chosen the index scan over sequential. > > 5000 looks like a suspiciously round number ... how many rows are in > the table? Have you done a vacuum analyze on it? > > regards, tom lane > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Fri, 8 Dec 2000, Tom Lane wrote: > Date: Fri, 08 Dec 2000 10:47:37 -0500 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: Oleg Bartunov <oleg@sai.msu.su> > Cc: selkovjr@mcs.anl.gov, 'pgsql-hackers ' <pgsql-hackers@postgresql.org> > Subject: Re: [HACKERS] Indexing for geographic objects? > > Oleg Bartunov <oleg@sai.msu.su> writes: > > We've done some work with GiST indices and found a little problem > > with optimizer. > > > test=# set enable_seqscan = off; > > SET VARIABLE > > test=# explain select * from test where s @ '1.05 .. 3.95'; > > NOTICE: QUERY PLAN: > > > Index Scan using test_seg_ix on test (cost=0.00..369.42 rows=5000 width=12) > > > EXPLAIN > > % ./bench.pl -d test -b 100 -i > > total: 1.71 sec; number: 100; for one: 0.017 sec; found 18 docs > > I'd venture that the major problem here is bogus estimated selectivities > for rtree/gist operators. Note the discrepancy between the estimated > row count and the actual (I assume the "found 18 docs" is the true > number of rows output by the query). With an estimated row count even yes, 18 docs is the true number > half that (ie, merely two orders of magnitude away from reality ;-)) > the thing would've correctly chosen the index scan over sequential. > > 5000 looks like a suspiciously round number ... how many rows are in > the table? Have you done a vacuum analyze on it? park-lane:~/app/pgsql/gist_problem$ wc SQL 10009 10049 157987 SQL about 10,000 rows, relevant part of script is: .....skipped... 1.9039...3.5139 1.8716...3.9317 \. CREATE INDEX test_seg_ix ON test USING gist (s); vacuum analyze; ^^^^^^^^^^^^^^ explain select * from test where s @ '1.05 .. 3.95'; set enable_seqscan = off; explain select * from test where s @ '1.05 .. 3.95'; Regards, Oleg > > regards, tom lane > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
On Fri, 8 Dec 2000, The Hermit Hacker wrote: > Date: Fri, 8 Dec 2000 12:19:56 -0400 (AST) > From: The Hermit Hacker <scrappy@hub.org> > To: Tom Lane <tgl@sss.pgh.pa.us> > Cc: Oleg Bartunov <oleg@sai.msu.su>, selkovjr@mcs.anl.gov, > 'pgsql-hackers ' <pgsql-hackers@postgresql.org> > Subject: Re: [HACKERS] Indexing for geographic objects? > > > just a note here ... recently, we had a client with similar problems with > using index scan, where turning off seqscan did the trick ... we took his > tables, loaded them into a v7.1beta1 server and it correctly comes up with > the index scan ... > > Oleg, have you tried this with v7.1 yet? Not yet. Just a plain 7.0.3 release. Will play with 7.1beta. But we're working in real life and need things to work in production :-) regards, Oleg > > On Fri, 8 Dec 2000, Tom Lane wrote: > > > Oleg Bartunov <oleg@sai.msu.su> writes: > > > We've done some work with GiST indices and found a little problem > > > with optimizer. > > > > > test=# set enable_seqscan = off; > > > SET VARIABLE > > > test=# explain select * from test where s @ '1.05 .. 3.95'; > > > NOTICE: QUERY PLAN: > > > > > Index Scan using test_seg_ix on test (cost=0.00..369.42 rows=5000 width=12) > > > > > EXPLAIN > > > % ./bench.pl -d test -b 100 -i > > > total: 1.71 sec; number: 100; for one: 0.017 sec; found 18 docs > > > > I'd venture that the major problem here is bogus estimated selectivities > > for rtree/gist operators. Note the discrepancy between the estimated > > row count and the actual (I assume the "found 18 docs" is the true > > number of rows output by the query). With an estimated row count even > > half that (ie, merely two orders of magnitude away from reality ;-)) > > the thing would've correctly chosen the index scan over sequential. > > > > 5000 looks like a suspiciously round number ... how many rows are in > > the table? Have you done a vacuum analyze on it? > > > > regards, tom lane > > > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy > Systems Administrator @ hub.org > primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: >> 5000 looks like a suspiciously round number ... how many rows are in >> the table? Have you done a vacuum analyze on it? > about 10,000 rows, So the thing is estimating 0.5 selectivity, which is a fallback for operators it knows nothing whatever about. [ ... digs in Selkov's scripts ... ] CREATE OPERATOR @ ( LEFTARG = seg, RIGHTARG = seg, PROCEDURE = seg_contains, COMMUTATOR = '~' ); CREATE OPERATOR ~ ( LEFTARG = seg, RIGHTARG = seg, PROCEDURE = seg_contained, COMMUTATOR = '@' ); Sure 'nuff, no selectivity info attached to these declarations. Try adding RESTRICT = contsel, JOIN = contjoinsel to them. That's still an entirely bogus estimate, but at least it's a smaller bogus estimate ... small enough to select an indexscan, one hopes (see utils/adt/geo_selfuncs.c). I have not dug through Gene's stuff to see which other indexable operators might be missing selectivity estimates, but I'll bet there are others. If you have the time to look through it and submit a patch, I can incorporate it into the version that will go into contrib. regards, tom lane
On Fri, 8 Dec 2000, Oleg Bartunov wrote: > On Fri, 8 Dec 2000, The Hermit Hacker wrote: > > > Date: Fri, 8 Dec 2000 12:19:56 -0400 (AST) > > From: The Hermit Hacker <scrappy@hub.org> > > To: Tom Lane <tgl@sss.pgh.pa.us> > > Cc: Oleg Bartunov <oleg@sai.msu.su>, selkovjr@mcs.anl.gov, > > 'pgsql-hackers ' <pgsql-hackers@postgresql.org> > > Subject: Re: [HACKERS] Indexing for geographic objects? > > > > > > just a note here ... recently, we had a client with similar problems with > > using index scan, where turning off seqscan did the trick ... we took his > > tables, loaded them into a v7.1beta1 server and it correctly comes up with > > the index scan ... > > > > Oleg, have you tried this with v7.1 yet? > > Not yet. Just a plain 7.0.3 release. Will play with 7.1beta. > But we're working in real life and need things to work in production :-) Okay, then I believe that what you are experience wiht v7.0.3 is already fixed in v7.1beta, based on similar results I got with some queries and then tested uver v7.1 ... > > regards, > Oleg > > > > > On Fri, 8 Dec 2000, Tom Lane wrote: > > > > > Oleg Bartunov <oleg@sai.msu.su> writes: > > > > We've done some work with GiST indices and found a little problem > > > > with optimizer. > > > > > > > test=# set enable_seqscan = off; > > > > SET VARIABLE > > > > test=# explain select * from test where s @ '1.05 .. 3.95'; > > > > NOTICE: QUERY PLAN: > > > > > > > Index Scan using test_seg_ix on test (cost=0.00..369.42 rows=5000 width=12) > > > > > > > EXPLAIN > > > > % ./bench.pl -d test -b 100 -i > > > > total: 1.71 sec; number: 100; for one: 0.017 sec; found 18 docs > > > > > > I'd venture that the major problem here is bogus estimated selectivities > > > for rtree/gist operators. Note the discrepancy between the estimated > > > row count and the actual (I assume the "found 18 docs" is the true > > > number of rows output by the query). With an estimated row count even > > > half that (ie, merely two orders of magnitude away from reality ;-)) > > > the thing would've correctly chosen the index scan over sequential. > > > > > > 5000 looks like a suspiciously round number ... how many rows are in > > > the table? Have you done a vacuum analyze on it? > > > > > > regards, tom lane > > > > > > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy > > Systems Administrator @ hub.org > > primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org > > > > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Fri, 8 Dec 2000, Tom Lane wrote: > Date: Fri, 08 Dec 2000 12:59:27 -0500 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: Oleg Bartunov <oleg@sai.msu.su> > Cc: selkovjr@mcs.anl.gov, 'pgsql-hackers ' <pgsql-hackers@postgresql.org> > Subject: Re: [HACKERS] Indexing for geographic objects? > > Oleg Bartunov <oleg@sai.msu.su> writes: > >> 5000 looks like a suspiciously round number ... how many rows are in > >> the table? Have you done a vacuum analyze on it? > > > about 10,000 rows, > > So the thing is estimating 0.5 selectivity, which is a fallback for > operators it knows nothing whatever about. > > [ ... digs in Selkov's scripts ... ] > > CREATE OPERATOR @ ( > LEFTARG = seg, RIGHTARG = seg, PROCEDURE = seg_contains, > COMMUTATOR = '~' > ); > > CREATE OPERATOR ~ ( > LEFTARG = seg, RIGHTARG = seg, PROCEDURE = seg_contained, > COMMUTATOR = '@' > ); > > Sure 'nuff, no selectivity info attached to these declarations. > Try adding > > RESTRICT = contsel, JOIN = contjoinsel > > to them. That's still an entirely bogus estimate, but at least > it's a smaller bogus estimate ... small enough to select an indexscan, > one hopes (see utils/adt/geo_selfuncs.c). Great ! Now we have better plan: test=# explain select * from test where s @ '1.05 .. 3.95'; NOTICE: QUERY PLAN: Index Scan using test_seg_ix on test (cost=0.00..61.56 rows=100 width=12) EXPLAIN > > I have not dug through Gene's stuff to see which other indexable > operators might be missing selectivity estimates, but I'll bet there > are others. If you have the time to look through it and submit a > patch, I can incorporate it into the version that will go into contrib. > We didn't look at Gene's stuff yet. Maybe Gene could find a time to check his code. > regards, tom lane > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Tom Lane wrote: > Oleg Bartunov <oleg@sai.msu.su> writes: > > We've done some work with GiST indices and found a little problem > > with optimizer. > > > test=# set enable_seqscan = off; > > SET VARIABLE > > test=# explain select * from test where s @ '1.05 .. 3.95'; > > NOTICE: QUERY PLAN: > > > Index Scan using test_seg_ix on test (cost=0.00..369.42 rows=5000 width=12) > > > EXPLAIN > > % ./bench.pl -d test -b 100 -i > > total: 1.71 sec; number: 100; for one: 0.017 sec; found 18 docs > > I'd venture that the major problem here is bogus estimated selectivities > for rtree/gist operators. Yes, the problem is, I didn't have the foggiest idea how to estimate selectivity, nor I had any stats when I developed the type. Before 7.0, I had some success using selectivity estimators of another datatype (I think that was int, but I am not sure). In 7.0, most of those estimators were gone and I have probably chosen the wrong ones or none at all, just so I could get it to work again. The performance was good enough for my taste, so I have even forgotten that was an issue. I know, I know: 'good enough' is never good. I apoligize. --Gene
It seems that R-trees become inefficient when the number of dimensions increase. Has anyone thoght of a transparent way to use Peano codes (hhcode in Oracle lingo), and use B-tree indexes instead? Also, I've read that R-trees sometimes suffer a lot when an update overflows a node in the index. The only initial problem I see with Peano codes is that the index is made on real cubes (all dimensions are equal, due to the recursive decomposition of space). To overcome that, people have talked about using multiple-entry-indexes. That is, an object is decomposed in a number of cubes (not too many), which are then indexed. In this case, there should be a way to make intersection searches be transparent. Oracle does it using tables and merge-joins. I have thought of using GiST to do that, but it seemed too advanced for me yet. So I thought of using the Oracle technique (make tables and use joins). Problem: I would need a C function to make the cubes describing an spatial object, but currently C functions cannot return more than one value (have of thoght of returning an array, but have not tried it). And making inserts directly from a C function has been described as magic stuff in the documentation. Yours sincerely, Edmar Wiggers BRASMAP Information Systems +55 48 9960 2752
Hi, we are getting a bit close to add index support for int arrays using GiST interface. This will really drive up performance of our full text search fully based on postgresql. We have a problem with broken index and couldn't find a reason. I attached archive with sources for GiST functions and test suite to show a problem - vacuum analyze at end end of TESTSQL should complain about broken index. Here is a short description: 1. untar in contrib 7.0.* 2. cd _intarray 3. edit Makefile for TESTDB (name of db for test) 4. createdb TESTDB 5. gmake 6. gmake install 7. psql TESTDB < TESTSQL Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Well, we found an answer ourserlves. Patch for 7.0.3 is included below. Credits to Teodor Sigaev (teodor@stack.net) Some comments: From src/backend/access/gist/gist.c /* ** Take a compressed entry, and install it on a page. Since we now know ** where the entry will live, we decompress it and recompress it using ** that knowledge (some compression routines may want to fish around ** on the page, for example, or do something special for leaf nodes.) */ After compressing of index it's written to disk decompressed (!) which is the reason we have the problem with broken index ! It looks like other people just didn't use index decompression function (at least in Gene's code decompression function just do return ) and that's why this bug was not discovered. We could make a patch for upcoming 7.1 if hackers desired. I consider this patch as a bugfix not a new feature or improvement. We got a very promising results. Another question to this code is - why gistPageAddItem does compress - decompress - compress. It's not clear from the comment. Best regards, Oleg ------------------------------------------------------------------------- maze% diff -c backend/access/gist/gist.c backend/access/gist/gist.c.orig *** backend/access/gist/gist.c Fri Dec 15 13:03:40 2000 --- backend/access/gist/gist.c.orig Fri Dec 15 13:00:50 2000 *************** *** 374,380 **** { GISTENTRY tmpcentry; IndexTuple itup = (IndexTuple) item; - OffsetNumber retval; /* * recompress the item given that we now know the exact page and --- 374,379 ---- *************** *** 386,400 **** IndexTupleSize(itup) - sizeof(IndexTupleData), FALSE); gistcentryinit(giststate, &tmpcentry, dentry->pred, r, page, offsetNumber, dentry->bytes, FALSE); ! *newtup = gist_tuple_replacekey(r, tmpcentry, itup); ! retval = PageAddItem(page, (Item) *newtup, IndexTupleSize(*newtup), ! offsetNumber, flags); /* be tidy */ if (tmpcentry.pred != dentry->pred && tmpcentry.pred != (((char *) itup) + sizeof(IndexTupleData))) pfree(tmpcentry.pred); ! return (retval); } --- 385,398 ---- IndexTupleSize(itup) - sizeof(IndexTupleData), FALSE); gistcentryinit(giststate, &tmpcentry, dentry->pred, r, page, offsetNumber, dentry->bytes, FALSE); ! *newtup = gist_tuple_replacekey(r, *dentry, itup); /* be tidy */ if (tmpcentry.pred != dentry->pred && tmpcentry.pred != (((char *) itup) + sizeof(IndexTupleData))) pfree(tmpcentry.pred); ! return (PageAddItem(page, (Item) *newtup, IndexTupleSize(*newtup), ! offsetNumber, flags)); } ----------------------------------------------------------------------- On Wed, 13 Dec 2000, Oleg Bartunov wrote: > Date: Wed, 13 Dec 2000 18:48:40 +0300 (GMT) > From: Oleg Bartunov <oleg@sai.msu.su> > To: selkovjr@mcs.anl.gov > Cc: Tom Lane <tgl@sss.pgh.pa.us>, vmikheev@SECTORBASE.COM, > 'pgsql-hackers ' <pgsql-hackers@postgresql.org> > Subject: [HACKERS] index support for arrays (GiST) > > Hi, > > we are getting a bit close to add index support for int arrays using > GiST interface. This will really drive up performance of our full text > search fully based on postgresql. We have a problem with broken index > and couldn't find a reason. I attached archive with sources > for GiST functions and test suite to show a problem - vacuum analyze > at end end of TESTSQL should complain about broken index. > Here is a short description: > 1. untar in contrib 7.0.* > 2. cd _intarray > 3. edit Makefile for TESTDB (name of db for test) > 4. createdb TESTDB > 5. gmake > 6. gmake install > 7. psql TESTDB < TESTSQL > > Regards, > > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
> It looks like other people just didn't use index decompression function > (at least in Gene's code decompression function just do return ) and > that's why this bug was not discovered. We could make a patch for > upcoming 7.1 if hackers desired. I consider this patch as a bugfix > not a new feature or improvement. We got a very promising results. Yes, send patches! Thanks to you and Gene for getting GiST back into view; it seems like a great feature which was neglected for too long. - Thomas
On Fri, 15 Dec 2000, Thomas Lockhart wrote: > Date: Fri, 15 Dec 2000 15:47:01 +0000 > From: Thomas Lockhart <lockhart@alumni.caltech.edu> > To: Oleg Bartunov <oleg@sai.msu.su> > Cc: selkovjr@mcs.anl.gov, Tom Lane <tgl@sss.pgh.pa.us>, > vmikheev@SECTORBASE.COM, > 'pgsql-hackers ' <pgsql-hackers@postgresql.org> > Subject: Re: [HACKERS] index support for arrays (GiST) > > > It looks like other people just didn't use index decompression function > > (at least in Gene's code decompression function just do return ) and > > that's why this bug was not discovered. We could make a patch for > > upcoming 7.1 if hackers desired. I consider this patch as a bugfix > > not a new feature or improvement. We got a very promising results. > > Yes, send patches! Thanks to you and Gene for getting GiST back into > view; it seems like a great feature which was neglected for too long. > We found one more bug with handling NULL values, so continue digging :-) Олег > - Thomas > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83