Thread: Indexing of geometric data

Indexing of geometric data

From
Barry Brown
Date:
Greetings,

I'm investigating the use of PostgreSQL to maintain a database of
mapping data.  My dataset currently stored latitudes and longitudes as
fixed-precision integers (ie, 37.234 is stored as 37234) in separate
columns: one for latitude, one for longitude.

Postgres has geometric data types, such as point, line, etc, but I'm a
little leary of using them because of performance reasons.  For one, the
internal representation of a point is a pair of double-precision floating
point numbers.  For a dataset containing millions of points, working
with floats could be expensive, both in storage and computation costs.

Second, I want to be able to do queries like "find all line segments
(or points) that overlap or intersect with a given bounding box."  I'm
afraid that without an appropriate index on the set of data points, a
query like that will be extremely slow.

So my questions are: what data structure is used to index a set of points?
Is it a spatially-oriented structure, such as a quadtree, that would make
queries like the example above very fast?  Can Postgres be adapted to
use integers instead of floats to store point coordinates?

Thanks!

Barry

Re: Indexing of geometric data

From
Thomas Lockhart
Date:
> I'm investigating the use of PostgreSQL to maintain a database of
> mapping data.  My dataset currently stored latitudes and longitudes as
> fixed-precision integers (ie, 37.234 is stored as 37234) in separate
> columns: one for latitude, one for longitude.
> Postgres has geometric data types, such as point, line, etc, but I'm a
> little leary of using them because of performance reasons.  For one, the
> internal representation of a point is a pair of double-precision floating
> point numbers.  For a dataset containing millions of points, working
> with floats could be expensive, both in storage and computation costs.

afaik the hit would be on storage only. Most modern processors can do
floating point multiplies and additions in one or a few clock cycles.

> Second, I want to be able to do queries like "find all line segments
> (or points) that overlap or intersect with a given bounding box."  I'm
> afraid that without an appropriate index on the set of data points, a
> query like that will be extremely slow.
> So my questions are: what data structure is used to index a set of points?
> Is it a spatially-oriented structure, such as a quadtree, that would make
> queries like the example above very fast?  Can Postgres be adapted to
> use integers instead of floats to store point coordinates?

You want to look into the rtree index support. Not sure how exactly you
use it, but I've seen reports that it works.

I'm pretty sure that using integers rather than floats is work not worth
doing. However, you *could* lift the code in backend/utils/adt/geo_ops.c
and then convert the routines to use integers. Then install as
user-defined types.

But I'd suggest putting the work into getting the existing types to do
what you want.

                      - Thomas