Re: Indexing large table of coordinates with GiST - Mailing list pgsql-general

From Nathan Clayton
Subject Re: Indexing large table of coordinates with GiST
Date
Msg-id 54B858E0.1020904@gmail.com
Whole thread Raw
In response to Re: Indexing large table of coordinates with GiST  (Daniel Begin <jfd553@hotmail.com>)
Responses Re: Indexing large table of coordinates with GiST  (Rémi Cura <remi.cura@gmail.com>)
List pgsql-general
On 1/15/2015 12:36 PM, Daniel Begin wrote:
>
> Thank, there is a lot of potential ways to resolve this problem!
>
> For Rob, here is a bit of context concerning my IT environment…
>
> Windows 7 64b Desktop, running with an Intel i7 core and 16GB ram. The
> PostgreSQL 9.3 database is stored on a 3TB external drive (USB-3
> connection with write cache enabled and backup battery) and a
> temp_tablespaces is pointing to a 1TB internal drive.
>
> Now, let me answered/questioned given proposals in the order I
> received them…
>
> 1-Andy, I will set maintenance_work_mem as large as I can unless
> someone points to an important caveat.
>
> 2-Vick, partitioning the table could have been very interesting.
> However, I will have to query the table using both the node ID (which
> could have provided a nice partition criterion) and/or the node
> location (find nodes within a polygon). I am not familiar with table
> partition but I suspect I can’t create a spatial index on a table that
> have been partitioned (split into multiple tables that inherit from
> the “master" table). Am I right?
>
> 3-Rémi, so many rows does not necessarily mean either raster or points
> cloud (but it’s worth asking!-).  As I mentioned previously, I must be
> able to query the table not only using nodes location (coordinates)
> but also using the few other fields the table contains (but mainly
> node IDs). So, I don’t think it could work, unless you tell me otherwise?
>
> 4-Paul, the nodes distribution is all over the world but mainly over
> inhabited areas. However, if I had to define a limit of some sort, I
> would use the dateline.  Concerning spatial queries, I will want to
> find nodes that are within the boundary of irregular polygons (stored
> in another table). Is querying on irregular polygons is compatible
> with geohashing?
>
> Regards,
>
> Daniel
>
>
Provided you have an integer primary key on both your node tables and
polygon tables, would it make sense to preprocess the overlaps and have
a many-to-many table with the node-id and polygon-id? Depending on the
speed in which data is ingested, you could easily build triggers to run
after inserts/updates to keep the table updated, or you could create a
globally unique autoincrement field that tracks revisions and update
everything after a given high-water mark.

Lookups and joins would be using integers and should give you much
better performance than searching through the polygons.

For the many-to-many table, something like (you can obviously parse it
out into smaller batches on the insert if you need to so you don't blow
up your memory usage. If needed you can have two tables partitioned on
either the node-id or the polygon-id to speed up lookups, as this table
has the potential to carry many times the records in either table -
worst case would be a cartesian join if all nodes fall within all polygons):

create table node_polygon (
   node_id bigint not null,
   polygon_id bigint not null,
   primary key (node_id, polygon_id)
);

insert into node_polygon (node_id, polygon_id)
select
   node_id,
   polygon_id
from
   node
   inner join polygon
     on node.point <@ polygon.shape;

create index ix_node_polygon_polygon on node_polygon (polygon_id);


pgsql-general by date:

Previous
From: Enrico Bianchi
Date:
Subject: Out of Memory
Next
From: John R Pierce
Date:
Subject: Re: Out of Memory