Thread: performance issue with distance function
I am trying to execute the following command: SELECT R.*, distance(L1.lat, L1.lon, L2.lat, L2.lon) AS Distance FROM Restaurants R, Locations L1, Locations L2, FoodTypeRestaurantIDX FTR WHERE R.Zipcode=L1.Zipcode AND L2.Zipcode = '93705' AND R.Delivery=true AND R.RestaurantID=FTR.RestaurantID AND FTR.FoodTypeID=1 AND distance(L1.lat, L1.lon, L2.lat, L2.lon) <= 60 LIMIT 100 OFFSET 0 I would rather execute this command: (but i get the error "Attribute 'distance' not found") SELECT R.*, distance(L1.lat, L1.lon, L2.lat, L2.lon) AS Distance FROM Restaurants R, Locations L1, Locations L2, FoodTypeRestaurantIDX FTR WHERE R.Zipcode=L1.Zipcode AND L2.Zipcode = '93705' AND R.Delivery=true AND R.RestaurantID=FTR.RestaurantID AND FTR.FoodTypeID=1 AND distance <= 60 LIMIT 100 OFFSET 0 Having that second distance function in the "WHERE" section of my sql statement is costing me at least 10-20 seconds of execution time. I am looking for a solution that will speed this up. Does anyone have any advice. Thanks in advance. Ryan PS. Here are the respective execution plans: Merge Join (cost=0.00..3463985.82 rows=4342404 width=202) -> Nested Loop (cost=0.00..3461172.63 rows=14735 width=166) -> Nested Loop (cost=0.00..127378.88 rows=147350 width=162) -> Index Scan using restaurantszipcodeidxon restaurants r (cost=0.00..62.50 rows=500 width=138) -> Index Scan using locationszipcodeidx on locations l2 (cost=0.00..251.69 rows=295 width=24) -> Seq Scan on foodtyperestaurantidx ftr (cost=0.00..22.50 rows=10 width=4) -> Index Scan using locationszipcodeidx on locations l1 (cost=0.00..2260.63 rows=29470 width=36) Hash Join (cost=74.08..804150.11 rows=1447468 width=266) -> Nested Loop (cost=0.00..326410.91 rows=2894936 width=120) -> Index Scan using locationszipcodeidx on locations l2 (cost=0.00..251.69 rows=295 width=60) -> Seq Scan on locations l1 (cost=0.00..664.70 rows=29470 width=60) -> Hash (cost=73.96..73.96 rows=50 width=146) -> Merge Join (cost=67.58..73.96 rows=50 width=146) -> Sort (cost=22.67..22.67 rows=10 width=8) -> Seq Scan on foodtyperestaurantidx ftr (cost=0.00..22.50 rows=10 width=8) -> Sort (cost=44.91..44.91 rows=500 width=138) -> SeqScan on restaurants r (cost=0.00..22.50 rows=500 width=138)
Ryan Littrell wrote: > > I am trying to execute the following command: > > SELECT R.*, distance(L1.lat, L1.lon, L2.lat, L2.lon) AS Distance > FROM Restaurants R, Locations L1, Locations L2, FoodTypeRestaurantIDX FTR > WHERE R.Zipcode=L1.Zipcode AND L2.Zipcode = '93705' AND R.Delivery=true AND > R.RestaurantID=FTR.RestaurantID AND FTR.FoodTypeID=1 AND distance(L1.lat, > L1.lon, L2.lat, L2.lon) <= 60 > LIMIT 100 OFFSET 0 > > I would rather execute this command: (but i get the error "Attribute > 'distance' not found") > > SELECT R.*, distance(L1.lat, L1.lon, L2.lat, L2.lon) AS Distance > FROM Restaurants R, Locations L1, Locations L2, FoodTypeRestaurantIDX FTR > WHERE R.Zipcode=L1.Zipcode AND L2.Zipcode = '93705' AND R.Delivery=true AND > R.RestaurantID=FTR.RestaurantID AND FTR.FoodTypeID=1 AND distance <= 60 > LIMIT 100 OFFSET 0 > > Having that second distance function in the "WHERE" section of my sql > statement is costing me at least 10-20 seconds of execution time. I am > looking for a solution that will speed this up. Does anyone have any advice. > Thanks in advance. > this probably isn't what you want, but would it speed things up if you did an "order by distance" instead of doing the "distance <= 60", then having your application cut the results at 60? that should work from a language perspective, at least, and if the distance function is pretty computationally intensive, it should help. -- Jeff Hoffmann PropertyKey.com
Hi Ryan, There is a bit of a strange way around the distance overhead issue : Create another table with structure like (lat1,long1,zip1,lat2,long2,zip2,distance) and precalculate the distance for each possibility. This means n*(n-1) rows if you have n location rows. You would then include this table in your query and use distance like you wanted to initially ( should work fast provided you index it on lat1,long1,distance) The calculation overhead of distance is then removed from your query ( at the expense of some disk space ). The insert of each new location requires n calculations of distance - you could perform this in the background I guess ! regards Mark