Thread: performance issue with distance function

performance issue with distance function

From
"Ryan Littrell"
Date:
I am trying to execute the following command:

SELECT R.*, distance(L1.lat, L1.lon, L2.lat, L2.lon) AS Distance
FROM Restaurants R, Locations L1, Locations L2, FoodTypeRestaurantIDX FTR
WHERE R.Zipcode=L1.Zipcode AND L2.Zipcode = '93705' AND R.Delivery=true AND
R.RestaurantID=FTR.RestaurantID AND FTR.FoodTypeID=1 AND distance(L1.lat,
L1.lon, L2.lat, L2.lon) <= 60
LIMIT 100  OFFSET 0


I would rather execute this command: (but i get the error "Attribute
'distance' not found")

SELECT R.*, distance(L1.lat, L1.lon, L2.lat, L2.lon) AS Distance
FROM Restaurants R, Locations L1, Locations L2, FoodTypeRestaurantIDX FTR
WHERE R.Zipcode=L1.Zipcode AND L2.Zipcode = '93705' AND R.Delivery=true AND
R.RestaurantID=FTR.RestaurantID AND FTR.FoodTypeID=1 AND distance <= 60
LIMIT 100  OFFSET 0


Having that second distance function in the "WHERE" section of my sql
statement is costing me at least 10-20 seconds of execution time.  I am
looking for a solution that will speed this up. Does anyone have any advice.
Thanks in advance.

Ryan



PS. Here are the respective execution plans:



Merge Join  (cost=0.00..3463985.82 rows=4342404 width=202) ->  Nested Loop  (cost=0.00..3461172.63 rows=14735
width=166)      ->  Nested Loop  (cost=0.00..127378.88 rows=147350 width=162)             ->  Index Scan using
restaurantszipcodeidxon restaurants r
 
(cost=0.00..62.50 rows=500 width=138)             ->  Index Scan using locationszipcodeidx on locations l2
(cost=0.00..251.69 rows=295 width=24)       ->  Seq Scan on foodtyperestaurantidx ftr  (cost=0.00..22.50 rows=10
width=4) ->  Index Scan using locationszipcodeidx on locations l1
(cost=0.00..2260.63 rows=29470 width=36)






Hash Join  (cost=74.08..804150.11 rows=1447468 width=266) ->  Nested Loop  (cost=0.00..326410.91 rows=2894936
width=120)      ->  Index Scan using locationszipcodeidx on locations l2
 
(cost=0.00..251.69 rows=295 width=60)       ->  Seq Scan on locations l1  (cost=0.00..664.70 rows=29470
width=60) ->  Hash  (cost=73.96..73.96 rows=50 width=146)       ->  Merge Join  (cost=67.58..73.96 rows=50 width=146)
         ->  Sort  (cost=22.67..22.67 rows=10 width=8)                   ->  Seq Scan on foodtyperestaurantidx ftr
 
(cost=0.00..22.50 rows=10 width=8)             ->  Sort  (cost=44.91..44.91 rows=500 width=138)                   ->
SeqScan on restaurants r  (cost=0.00..22.50
 
rows=500 width=138)




Re: performance issue with distance function

From
Jeff Hoffmann
Date:
Ryan Littrell wrote:
> 
> I am trying to execute the following command:
> 
> SELECT R.*, distance(L1.lat, L1.lon, L2.lat, L2.lon) AS Distance
> FROM Restaurants R, Locations L1, Locations L2, FoodTypeRestaurantIDX FTR
> WHERE R.Zipcode=L1.Zipcode AND L2.Zipcode = '93705' AND R.Delivery=true AND
> R.RestaurantID=FTR.RestaurantID AND FTR.FoodTypeID=1 AND distance(L1.lat,
> L1.lon, L2.lat, L2.lon) <= 60
> LIMIT 100  OFFSET 0
> 
> I would rather execute this command: (but i get the error "Attribute
> 'distance' not found")
> 
> SELECT R.*, distance(L1.lat, L1.lon, L2.lat, L2.lon) AS Distance
> FROM Restaurants R, Locations L1, Locations L2, FoodTypeRestaurantIDX FTR
> WHERE R.Zipcode=L1.Zipcode AND L2.Zipcode = '93705' AND R.Delivery=true AND
> R.RestaurantID=FTR.RestaurantID AND FTR.FoodTypeID=1 AND distance <= 60
> LIMIT 100  OFFSET 0
> 
> Having that second distance function in the "WHERE" section of my sql
> statement is costing me at least 10-20 seconds of execution time.  I am
> looking for a solution that will speed this up. Does anyone have any advice.
> Thanks in advance.
> 

this probably isn't what you want, but would it speed things up if you
did an "order by distance" instead of doing the "distance <= 60", then
having your application cut the results at 60?  that should work from a
language perspective, at least, and if the distance function is pretty
computationally intensive, it should help.
-- 

Jeff Hoffmann
PropertyKey.com


Re: performance issue with distance function

From
Mark kirkwood
Date:
Hi Ryan,

There is a bit of a strange way around the distance overhead issue :

Create another table with structure like 
(lat1,long1,zip1,lat2,long2,zip2,distance)

and precalculate the distance for each possibility. This means n*(n-1) rows 
if you have n location rows. You would then include this table in your query 
and use distance like you wanted to initially ( should work fast provided you 
index it on lat1,long1,distance)

The calculation overhead of distance is then removed from your query ( at the 
expense of some disk space ). The insert of each new location requires n 
calculations of distance - you could perform this in the background I guess !

regards

Mark