Thread: R-tree, order by, limit

R-tree, order by, limit

From
"Anton Belyaev"
Date:
Hello,

I am implementing a map application. There are towns with altitude,
longitude and population.
One of the tasks is to be able to query N biggest (by population)
towns within a rectangle.

Something like (maybe the syntax in not quite right, but the idea is obvious):
SELECT * FROM towns where alt1 <= alt <= alt2 AND long1 <= long <=
long2 ORDER BY population LIMIT 10;

If I create an R-tree index on coordinates (alt, long) this will speed
up the query significantly. But it is still far from optimal: Despite
we need only 10 biggest towns, all towns in the rectangle specified
will be examined.

What if we include population into R-tree index? This index will
handle a 3D space with coordinates (alt, long, population).
Will this 3D index perform better than that 2D index?

In fact, I lack some details on how Postges handles ORDER_BY and LIMIT
inside R-tree indexes.
Extensive answers and links are appreciated.

Thanks.
Anton.

Re: R-tree, order by, limit

From
Volkan YAZICI
Date:
On Sun, 21 Sep 2008, "Anton Belyaev" <anton.belyaev@gmail.com> writes:
> SELECT * FROM towns where alt1 <= alt <= alt2 AND long1 <= long <=
> long2 ORDER BY population LIMIT 10;

You're absolutely on the wrong path. Don't try to implement a logic,
that has been implemented by PostgreSQL in the most possibly efficient
way in its bounds. See geographic data types[1] (e.g. box) and
geographic functions[2] (e.g. @> a.k.a contains).


Regards.

[1] http://www.postgresql.org/docs/current/interactive/datatype-geometric.html
[2] http://www.postgresql.org/docs/current/interactive/functions-geometry.html

Re: R-tree, order by, limit

From
"Anton Belyaev"
Date:
2008/9/21 Volkan YAZICI <yazicivo@ttmail.com>:
> On Sun, 21 Sep 2008, "Anton Belyaev" <anton.belyaev@gmail.com> writes:
>> SELECT * FROM towns where alt1 <= alt <= alt2 AND long1 <= long <=
>> long2 ORDER BY population LIMIT 10;
>
> You're absolutely on the wrong path. Don't try to implement a logic,
> that has been implemented by PostgreSQL in the most possibly efficient
> way in its bounds. See geographic data types[1] (e.g. box) and
> geographic functions[2] (e.g. @> a.k.a contains).
>
>
> Regards.
>
> [1] http://www.postgresql.org/docs/current/interactive/datatype-geometric.html
> [2] http://www.postgresql.org/docs/current/interactive/functions-geometry.html
>

Volkan,
Thanks you for your reply.
Geometry types and functions use R-tree indexes anyways.

I can rephrase the query using geometry language of Postgres:
SELECT * FROM towns WHERE towns.coordinates <@ box(alt1, long1, alt2,
long2) ORDER BY population LIMIT 10;

And the questions about population remain the same:
How to avoid examination of all the towns in the rectangle knowing
that we need only 10 biggest?
Does population worth including into a (3D) point (In order to create
a 3D R-tree)? Does Postgres perform ODRER/LIMIT efficiently in this
case?

Thanks.
Anton.

Re: R-tree, order by, limit

From
"Anton Belyaev"
Date:
2008/9/21 Anton Belyaev <anton.belyaev@gmail.com>:
> Hello,
>
> I am implementing a map application. There are towns with altitude,
> longitude and population.
> One of the tasks is to be able to query N biggest (by population)
> towns within a rectangle.
>
> Something like (maybe the syntax in not quite right, but the idea is obvious):
> SELECT * FROM towns where alt1 <= alt <= alt2 AND long1 <= long <=
> long2 ORDER BY population LIMIT 10;
>
> If I create an R-tree index on coordinates (alt, long) this will speed
> up the query significantly. But it is still far from optimal: Despite
> we need only 10 biggest towns, all towns in the rectangle specified
> will be examined.
>
> What if we include population into R-tree index? This index will
> handle a 3D space with coordinates (alt, long, population).
> Will this 3D index perform better than that 2D index?
>
> In fact, I lack some details on how Postges handles ORDER_BY and LIMIT
> inside R-tree indexes.
> Extensive answers and links are appreciated.
>
> Thanks.
> Anton.
>

Sorry, I meant latitude (lat) instead of altitude (alt).

Re: R-tree, order by, limit

From
Martijn van Oosterhout
Date:
On Sun, Sep 21, 2008 at 06:17:39PM +0400, Anton Belyaev wrote:
> Geometry types and functions use R-tree indexes anyways.
>
> I can rephrase the query using geometry language of Postgres:
> SELECT * FROM towns WHERE towns.coordinates <@ box(alt1, long1, alt2,
> long2) ORDER BY population LIMIT 10;
>
> And the questions about population remain the same:
> How to avoid examination of all the towns in the rectangle knowing
> that we need only 10 biggest?

I don't know if it solves your problem, but you should be able to do a
multi-column GiST index with both the position data and the population
data in it. However, I'm unsure if postgresql will correctly use the
index to solve the order by...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

Re: R-tree, order by, limit

From
Mark Cave-Ayland
Date:
Anton Belyaev wrote:

>> I am implementing a map application. There are towns with altitude,
>> longitude and population.
>> One of the tasks is to be able to query N biggest (by population)
>> towns within a rectangle.

Hi Anton,

Have you considered using PostGIS? (http://postgis.refractions.net). It
implements the OGC SFS for geometries and is compatible with a large
number of open source viewers/tools such as Mapserver, Geoserver, QGIS,
OGR etc...


ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

Re: R-tree, order by, limit

From
"Anton Belyaev"
Date:
2008/9/21 Martijn van Oosterhout <kleptog@svana.org>:
> On Sun, Sep 21, 2008 at 06:17:39PM +0400, Anton Belyaev wrote:
>> Geometry types and functions use R-tree indexes anyways.
>>
>> I can rephrase the query using geometry language of Postgres:
>> SELECT * FROM towns WHERE towns.coordinates <@ box(alt1, long1, alt2,
>> long2) ORDER BY population LIMIT 10;
>>
>> And the questions about population remain the same:
>> How to avoid examination of all the towns in the rectangle knowing
>> that we need only 10 biggest?
>
> I don't know if it solves your problem, but you should be able to do a
> multi-column GiST index with both the position data and the population
> data in it. However, I'm unsure if postgresql will correctly use the
> index to solve the order by...

Martijn, thanks for you reply.
Implementing a 3D R-tree index in Postgres is only possible via
implementation of GiST interface. At least, this is the only approach
I consider, because implementing a brand new index access method
requires much more than just classic R-tree implementation.
So, yes, question remains the same, but a bit updated:
How efficiently Postgres handles ORDER BY + LIMIT when using GiST?
(Particularly, when an R-tree is implemented via GiST).

Anton.

Re: R-tree, order by, limit

From
"Anton Belyaev"
Date:
2008/9/22 Mark Cave-Ayland <mark.cave-ayland@siriusit.co.uk>:
>>> I am implementing a map application. There are towns with altitude,
>>> longitude and population.
>>> One of the tasks is to be able to query N biggest (by population)
>>> towns within a rectangle.
> Have you considered using PostGIS? (http://postgis.refractions.net). It
> implements the OGC SFS for geometries and is compatible with a large number
> of open source viewers/tools such as Mapserver, Geoserver, QGIS, OGR etc...

Mark, thanks for the suggestion.
I examined PostGIS some time ago. It is too complex for my simple task
and it gives no advantages for me:
For spatial indexing it uses the same GiST-based R-tree.
And PostGIS does not offer that "population" or "priority" queries I need.

Anton.

Re: R-tree, order by, limit

From
Volkan YAZICI
Date:
On Sun, 21 Sep 2008, "Anton Belyaev" <anton.belyaev@gmail.com> writes:
> And the questions about population remain the same:
> How to avoid examination of all the towns in the rectangle knowing
> that we need only 10 biggest?
> Does population worth including into a (3D) point (In order to create
> a 3D R-tree)? Does Postgres perform ODRER/LIMIT efficiently in this
> case?

Can we see the EXPLAIN ANALYZE for both situations -- with and without
geographic data types/functions?


Regards.

Re: R-tree, order by, limit

From
Mark Cave-Ayland
Date:
Anton Belyaev wrote:

> Mark, thanks for the suggestion.
> I examined PostGIS some time ago. It is too complex for my simple task
> and it gives no advantages for me:

Well okay but bear in mind the PostGIS is the de-facto standard for most
open source GIS tools. Programs like QGIS et al can visualise the
content of PostGIS tables just by pointing it towards the relevant
database - the in-built PostgreSQL geometry types aren't supported by
anything as far as I know. And don't forget coordinate re-projection -
PostGIS also allows you to re-project between latitude/longitude and
local map spatial reference systems on the fly.

> For spatial indexing it uses the same GiST-based R-tree.

Not quite. The PostGIS indexes have been improved to include selectivity
functions to allow the planner to determine when it should use the
spatial index. AFAIK the in-built PostgreSQL types use fixed values, so
the choice of index usage will be incredibly naive and often wrong on
larger datasets mixing spatial and non-spatial columns as part of the
search query.

> And PostGIS does not offer that "population" or "priority" queries I need.

Maybe. But you may find the wiki at
http://postgis.refractions.net/support/wiki/ is a good starting point
for code examples.


ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063