Thread: Best PostGIS function for finding the nearest line segment to a given point

Best PostGIS function for finding the nearest line segment to a given point

From
René Fournier
Date:
Wow, have to say, I love Postgresql and PostGIS. Just awesome.

So I have a table with ~400,000 rows, each representing a road or street (multi line segment). I want to select the row whose line segment is closest the a given point. The following query...

gc3=# SELECT r_stname_c, r_placenam, ST_Distance(ST_GeomFromText('POINT(-114.053205 51.069644)',4269),the_geom) AS distance FROM nrn_ab_8_0_roadseg ORDER BY distance ASC LIMIT 1;

...works and produces...

      r_stname_c      | r_placenam |       distance       
----------------------+------------+----------------------
 19 Avenue North-east | Calgary    | 5.74515867479735e-05

…but seems a little slow (yes, there is a GIST index on the_geom). Explain shows:

gc3=# explain SELECT r_stname_c, r_placenam, ST_Distance(ST_GeomFromText('POINT(-114.053205 51.069644)',4269),the_geom) AS distance FROM nrn_ab_8_0_roadseg ORDER BY distance asc limit 1;
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Limit  (cost=128520.06..128520.06 rows=1 width=464)
   ->  Sort  (cost=128520.06..129493.57 rows=389404 width=464)
         Sort Key: (st_distance('0101000020AD100000F5BEF1B567835CC06A2E3718EA884940'::geometry, the_geom))
         ->  Seq Scan on nrn_ab_8_0_roadseg  (cost=0.00..126573.04 rows=389404 width=464)
(4 rows)

Any suggests how to speed it up? Coming from MySQL, I'm brand-new to PostGIS (and Postgresql FWIW) and all the awesome spatial functions it has. I would think that maybe selecting a bounding box of rows, and then finding the one with the closest distance? 

…Rene

Re: Best PostGIS function for finding the nearest line segment to a given point

From
Filip Rembiałkowski
Date:

2011/10/8 René Fournier <renefournier@gmail.com>
Wow, have to say, I love Postgresql and PostGIS. Just awesome.

So I have a table with ~400,000 rows, each representing a road or street (multi line segment). I want to select the row whose line segment is closest the a given point. The following query...

gc3=# SELECT r_stname_c, r_placenam, ST_Distance(ST_GeomFromText('POINT(-114.053205 51.069644)',4269),the_geom) AS distance FROM nrn_ab_8_0_roadseg ORDER BY distance ASC LIMIT 1;

...works and produces...

      r_stname_c      | r_placenam |       distance       
----------------------+------------+----------------------
 19 Avenue North-east | Calgary    | 5.74515867479735e-05

…but seems a little slow (yes, there is a GIST index on the_geom). Explain shows:

gc3=# explain SELECT r_stname_c, r_placenam, ST_Distance(ST_GeomFromText('POINT(-114.053205 51.069644)',4269),the_geom) AS distance FROM nrn_ab_8_0_roadseg ORDER BY distance asc limit 1;
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Limit  (cost=128520.06..128520.06 rows=1 width=464)
   ->  Sort  (cost=128520.06..129493.57 rows=389404 width=464)
         Sort Key: (st_distance('0101000020AD100000F5BEF1B567835CC06A2E3718EA884940'::geometry, the_geom))
         ->  Seq Scan on nrn_ab_8_0_roadseg  (cost=0.00..126573.04 rows=389404 width=464)
(4 rows)

Any suggests how to speed it up? Coming from MySQL, I'm brand-new to PostGIS (and Postgresql FWIW) and all the awesome spatial functions it has. I would think that maybe selecting a bounding box of rows, and then finding the one with the closest distance? 


Yes exactly. That's how people do it now, in pre-PostGIS-2.0 era :-)

Make a search by bounding boxes, starting with some arbitraly selected radius. Increase the radius until you have at least N=1 result found, than sort these results by ST_Distance and select nearest neighbour.

PostGIS 2.0 solution: see http://blog.opengeo.org/2011/09/28/indexed-nearest-neighbour-search-in-postgis/



Re: Best PostGIS function for finding the nearest line segment to a given point

From
René Fournier
Date:
Hi Filip,

On 2011-10-08, at 2:26 AM, Filip Rembiałkowski wrote:

Any suggests how to speed it up? Coming from MySQL, I'm brand-new to PostGIS (and Postgresql FWIW) and all the awesome spatial functions it has. I would think that maybe selecting a bounding box of rows, and then finding the one with the closest distance? 


Yes exactly. That's how people do it now, in pre-PostGIS-2.0 era :-)

Make a search by bounding boxes, starting with some arbitraly selected radius. Increase the radius until you have at least N=1 result found, than sort these results by ST_Distance and select nearest neighbour.

PostGIS 2.0 solution: see http://blog.opengeo.org/2011/09/28/indexed-nearest-neighbour-search-in-postgis/

Thanks. Based on some further reading, this is what I came up with, in order to hopefully use the GiST index to greatest benefit:

gc3=# SELECT datasetnam, r_hnumf, r_hnuml, r_stname_c, r_placenam, ST_Distance(ST_GeomFromText('POINT(-114.053205 51.069644)',4269),the_geom) AS distance 
gc3-# FROM nrn_ab_8_0_roadseg 
gc3-# WHERE the_geom && SetSRID('BOX3D(-114.1 49.9,-113.9 51.1)'::box3d,4269)
gc3-# ORDER BY distance ASC LIMIT 5;
 datasetnam | r_hnumf | r_hnuml |      r_stname_c      | r_placenam |       distance       
------------+---------+---------+----------------------+------------+----------------------
 Alberta    |     407 |     459 | 19 Avenue North-east | Calgary    | 5.74515867479735e-05
 Alberta    |    2004 |    2004 | 4 Street North-east  | Calgary    | 0.000663366090673065
 Alberta    |       0 |       0 | 4 Street North-east  | Calgary    | 0.000667603774783403
 Alberta    |     425 |     425 | 18 Avenue North-east | Calgary    | 0.000835708003512673
 Alberta    |     407 |     449 | 20 Avenue North-east | Calgary    | 0.000981910679856406
(5 rows)

gc3=# EXPLAIN SELECT datasetnam, r_hnumf, r_hnuml, r_stname_c, r_placenam, ST_Distance(ST_GeomFromText('POINT(-114.053205 51.069644)',4269),the_geom) AS distance 
gc3-# FROM nrn_ab_8_0_roadseg 
gc3-# WHERE the_geom && SetSRID('BOX3D(-114.1 49.9,-113.9 51.1)'::box3d,4269)
gc3-# ORDER BY distance ASC LIMIT 5;
                                                                                                                          QUERY PLAN                                                                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=33632.15..33632.16 rows=5 width=480)
   ->  Sort  (cost=33632.15..33693.00 rows=24341 width=480)
         Sort Key: (st_distance('0101000020AD100000F5BEF1B567835CC06A2E3718EA884940'::geometry, the_geom))
         ->  Bitmap Heap Scan on nrn_ab_8_0_roadseg  (cost=812.99..33227.85 rows=24341 width=480)
               Recheck Cond: (the_geom && '0103000020AD10000001000000050000006666666666865CC03333333333F348406666666666865CC0CDCCCCCCCC8C49409A99999999795CC0CDCCCCCCCC8C49409A99999999795CC03333333333F348406666666666865CC03333333333F34840'::geometry)
               ->  Bitmap Index Scan on nrn_ab_8_0_roadseg_the_geom_gist  (cost=0.00..806.91 rows=24341 width=0)
                     Index Cond: (the_geom && '0103000020AD10000001000000050000006666666666865CC03333333333F348406666666666865CC0CDCCCCCCCC8C49409A99999999795CC0CDCCCCCCCC8C49409A99999999795CC03333333333F348406666666666865CC03333333333F34840'::geometry)
(7 rows)


Does this appear optimal to you?

Best regards,
René Fournier

Re: Best PostGIS function for finding the nearest line segment to a given point

From
Filip Rembiałkowski
Date:

2011/10/8 René Fournier <renefournier@gmail.com>

Thanks. Based on some further reading, this is what I came up with, in order to hopefully use the GiST index to greatest benefit:

gc3=# SELECT datasetnam, r_hnumf, r_hnuml, r_stname_c, r_placenam, ST_Distance(ST_GeomFromText('POINT(-114.053205 51.069644)',4269),the_geom) AS distance 
gc3-# FROM nrn_ab_8_0_roadseg 
gc3-# WHERE the_geom && SetSRID('BOX3D(-114.1 49.9,-113.9 51.1)'::box3d,4269)
gc3-# ORDER BY distance ASC LIMIT 5;
 datasetnam | r_hnumf | r_hnuml |      r_stname_c      | r_placenam |       distance       
------------+---------+---------+----------------------+------------+----------------------
 Alberta    |     407 |     459 | 19 Avenue North-east | Calgary    | 5.74515867479735e-05
 Alberta    |    2004 |    2004 | 4 Street North-east  | Calgary    | 0.000663366090673065
 Alberta    |       0 |       0 | 4 Street North-east  | Calgary    | 0.000667603774783403
 Alberta    |     425 |     425 | 18 Avenue North-east | Calgary    | 0.000835708003512673
 Alberta    |     407 |     449 | 20 Avenue North-east | Calgary    | 0.000981910679856406
(5 rows)

gc3=# EXPLAIN SELECT datasetnam, r_hnumf, r_hnuml, r_stname_c, r_placenam, ST_Distance(ST_GeomFromText('POINT(-114.053205 51.069644)',4269),the_geom) AS distance 
gc3-# FROM nrn_ab_8_0_roadseg 
gc3-# WHERE the_geom && SetSRID('BOX3D(-114.1 49.9,-113.9 51.1)'::box3d,4269)
gc3-# ORDER BY distance ASC LIMIT 5;
                                                                                                                          QUERY PLAN                                                                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=33632.15..33632.16 rows=5 width=480)
   ->  Sort  (cost=33632.15..33693.00 rows=24341 width=480)
         Sort Key: (st_distance('0101000020AD100000F5BEF1B567835CC06A2E3718EA884940'::geometry, the_geom))
         ->  Bitmap Heap Scan on nrn_ab_8_0_roadseg  (cost=812.99..33227.85 rows=24341 width=480)
               Recheck Cond: (the_geom && '0103000020AD10000001000000050000006666666666865CC03333333333F348406666666666865CC0CDCCCCCCCC8C49409A99999999795CC0CDCCCCCCCC8C49409A99999999795CC03333333333F348406666666666865CC03333333333F34840'::geometry)
               ->  Bitmap Index Scan on nrn_ab_8_0_roadseg_the_geom_gist  (cost=0.00..806.91 rows=24341 width=0)
                     Index Cond: (the_geom && '0103000020AD10000001000000050000006666666666865CC03333333333F348406666666666865CC0CDCCCCCCCC8C49409A99999999795CC0CDCCCCCCCC8C49409A99999999795CC03333333333F348406666666666865CC03333333333F34840'::geometry)
(7 rows)


Does this appear optimal to you?



I think it's closer to optimal. The real question is: is this fast enough for your application?

Can you show EXPLAIN (ANALYZE on,BUFFERS on) result?