Thread: R-tree, order by, limit
Hello, I am implementing a map application. There are towns with altitude, longitude and population. One of the tasks is to be able to query N biggest (by population) towns within a rectangle. Something like (maybe the syntax in not quite right, but the idea is obvious): SELECT * FROM towns where alt1 <= alt <= alt2 AND long1 <= long <= long2 ORDER BY population LIMIT 10; If I create an R-tree index on coordinates (alt, long) this will speed up the query significantly. But it is still far from optimal: Despite we need only 10 biggest towns, all towns in the rectangle specified will be examined. What if we include population into R-tree index? This index will handle a 3D space with coordinates (alt, long, population). Will this 3D index perform better than that 2D index? In fact, I lack some details on how Postges handles ORDER_BY and LIMIT inside R-tree indexes. Extensive answers and links are appreciated. Thanks. Anton.
On Sun, 21 Sep 2008, "Anton Belyaev" <anton.belyaev@gmail.com> writes: > SELECT * FROM towns where alt1 <= alt <= alt2 AND long1 <= long <= > long2 ORDER BY population LIMIT 10; You're absolutely on the wrong path. Don't try to implement a logic, that has been implemented by PostgreSQL in the most possibly efficient way in its bounds. See geographic data types[1] (e.g. box) and geographic functions[2] (e.g. @> a.k.a contains). Regards. [1] http://www.postgresql.org/docs/current/interactive/datatype-geometric.html [2] http://www.postgresql.org/docs/current/interactive/functions-geometry.html
2008/9/21 Volkan YAZICI <yazicivo@ttmail.com>: > On Sun, 21 Sep 2008, "Anton Belyaev" <anton.belyaev@gmail.com> writes: >> SELECT * FROM towns where alt1 <= alt <= alt2 AND long1 <= long <= >> long2 ORDER BY population LIMIT 10; > > You're absolutely on the wrong path. Don't try to implement a logic, > that has been implemented by PostgreSQL in the most possibly efficient > way in its bounds. See geographic data types[1] (e.g. box) and > geographic functions[2] (e.g. @> a.k.a contains). > > > Regards. > > [1] http://www.postgresql.org/docs/current/interactive/datatype-geometric.html > [2] http://www.postgresql.org/docs/current/interactive/functions-geometry.html > Volkan, Thanks you for your reply. Geometry types and functions use R-tree indexes anyways. I can rephrase the query using geometry language of Postgres: SELECT * FROM towns WHERE towns.coordinates <@ box(alt1, long1, alt2, long2) ORDER BY population LIMIT 10; And the questions about population remain the same: How to avoid examination of all the towns in the rectangle knowing that we need only 10 biggest? Does population worth including into a (3D) point (In order to create a 3D R-tree)? Does Postgres perform ODRER/LIMIT efficiently in this case? Thanks. Anton.
2008/9/21 Anton Belyaev <anton.belyaev@gmail.com>: > Hello, > > I am implementing a map application. There are towns with altitude, > longitude and population. > One of the tasks is to be able to query N biggest (by population) > towns within a rectangle. > > Something like (maybe the syntax in not quite right, but the idea is obvious): > SELECT * FROM towns where alt1 <= alt <= alt2 AND long1 <= long <= > long2 ORDER BY population LIMIT 10; > > If I create an R-tree index on coordinates (alt, long) this will speed > up the query significantly. But it is still far from optimal: Despite > we need only 10 biggest towns, all towns in the rectangle specified > will be examined. > > What if we include population into R-tree index? This index will > handle a 3D space with coordinates (alt, long, population). > Will this 3D index perform better than that 2D index? > > In fact, I lack some details on how Postges handles ORDER_BY and LIMIT > inside R-tree indexes. > Extensive answers and links are appreciated. > > Thanks. > Anton. > Sorry, I meant latitude (lat) instead of altitude (alt).
On Sun, Sep 21, 2008 at 06:17:39PM +0400, Anton Belyaev wrote: > Geometry types and functions use R-tree indexes anyways. > > I can rephrase the query using geometry language of Postgres: > SELECT * FROM towns WHERE towns.coordinates <@ box(alt1, long1, alt2, > long2) ORDER BY population LIMIT 10; > > And the questions about population remain the same: > How to avoid examination of all the towns in the rectangle knowing > that we need only 10 biggest? I don't know if it solves your problem, but you should be able to do a multi-column GiST index with both the position data and the population data in it. However, I'm unsure if postgresql will correctly use the index to solve the order by... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
Anton Belyaev wrote: >> I am implementing a map application. There are towns with altitude, >> longitude and population. >> One of the tasks is to be able to query N biggest (by population) >> towns within a rectangle. Hi Anton, Have you considered using PostGIS? (http://postgis.refractions.net). It implements the OGC SFS for geometries and is compatible with a large number of open source viewers/tools such as Mapserver, Geoserver, QGIS, OGR etc... ATB, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063
2008/9/21 Martijn van Oosterhout <kleptog@svana.org>: > On Sun, Sep 21, 2008 at 06:17:39PM +0400, Anton Belyaev wrote: >> Geometry types and functions use R-tree indexes anyways. >> >> I can rephrase the query using geometry language of Postgres: >> SELECT * FROM towns WHERE towns.coordinates <@ box(alt1, long1, alt2, >> long2) ORDER BY population LIMIT 10; >> >> And the questions about population remain the same: >> How to avoid examination of all the towns in the rectangle knowing >> that we need only 10 biggest? > > I don't know if it solves your problem, but you should be able to do a > multi-column GiST index with both the position data and the population > data in it. However, I'm unsure if postgresql will correctly use the > index to solve the order by... Martijn, thanks for you reply. Implementing a 3D R-tree index in Postgres is only possible via implementation of GiST interface. At least, this is the only approach I consider, because implementing a brand new index access method requires much more than just classic R-tree implementation. So, yes, question remains the same, but a bit updated: How efficiently Postgres handles ORDER BY + LIMIT when using GiST? (Particularly, when an R-tree is implemented via GiST). Anton.
2008/9/22 Mark Cave-Ayland <mark.cave-ayland@siriusit.co.uk>: >>> I am implementing a map application. There are towns with altitude, >>> longitude and population. >>> One of the tasks is to be able to query N biggest (by population) >>> towns within a rectangle. > Have you considered using PostGIS? (http://postgis.refractions.net). It > implements the OGC SFS for geometries and is compatible with a large number > of open source viewers/tools such as Mapserver, Geoserver, QGIS, OGR etc... Mark, thanks for the suggestion. I examined PostGIS some time ago. It is too complex for my simple task and it gives no advantages for me: For spatial indexing it uses the same GiST-based R-tree. And PostGIS does not offer that "population" or "priority" queries I need. Anton.
On Sun, 21 Sep 2008, "Anton Belyaev" <anton.belyaev@gmail.com> writes: > And the questions about population remain the same: > How to avoid examination of all the towns in the rectangle knowing > that we need only 10 biggest? > Does population worth including into a (3D) point (In order to create > a 3D R-tree)? Does Postgres perform ODRER/LIMIT efficiently in this > case? Can we see the EXPLAIN ANALYZE for both situations -- with and without geographic data types/functions? Regards.
Anton Belyaev wrote: > Mark, thanks for the suggestion. > I examined PostGIS some time ago. It is too complex for my simple task > and it gives no advantages for me: Well okay but bear in mind the PostGIS is the de-facto standard for most open source GIS tools. Programs like QGIS et al can visualise the content of PostGIS tables just by pointing it towards the relevant database - the in-built PostgreSQL geometry types aren't supported by anything as far as I know. And don't forget coordinate re-projection - PostGIS also allows you to re-project between latitude/longitude and local map spatial reference systems on the fly. > For spatial indexing it uses the same GiST-based R-tree. Not quite. The PostGIS indexes have been improved to include selectivity functions to allow the planner to determine when it should use the spatial index. AFAIK the in-built PostgreSQL types use fixed values, so the choice of index usage will be incredibly naive and often wrong on larger datasets mixing spatial and non-spatial columns as part of the search query. > And PostGIS does not offer that "population" or "priority" queries I need. Maybe. But you may find the wiki at http://postgis.refractions.net/support/wiki/ is a good starting point for code examples. ATB, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063