Query Join Performance - Mailing list pgsql-sql

From Aaron Bono
Subject Query Join Performance
Date
Msg-id bf05e51c0704241826u3e6652d1m26973b837e0b1397@mail.gmail.com
Whole thread Raw
Responses Re: Query Join Performance  (Richard Huxton <dev@archonet.com>)
List pgsql-sql
I have a database with the following tables:<br /><br />zip {<br />    zip_id bigserial,<br />    zip_cd varchar(6) --
thisis indexed<br />}<br /><br />city {<br />    city_id bigserial,<br />    city_name varchar(50)<br />}<br /><br
/>zip_city{ <br />    city_id bigint (FK to city table),<br />    zip_id bigint (FK to zip table),<br />    longitude
numeric,<br/>    latitude numeric<br />}<br /><br />We want to find the zip_cd values for all zip codes within 50 miles
ofa specified zip code and have this query so far (assume our given zip code is 64131): <br /><br />    select<br />   
   zip_city.zip_id,<br />        gps_distance(zip_city.longitude::numeric, zip_city.latitude::numeric,
my_location.longitude::numeric,my_location.latitude::numeric) AS distance<br />    from zip_city <br />    inner join
(<br/>        select<br />            longitude,<br />            latitude<br />        from zip<br />        inner
joinzip_city on (<br />            zip.zip_id = zip_city.zip_id<br />        )<br />        where zip_cd = '64131' <br
/>   ) my_location on (<br />        -- We use this as an approximation to filter out as many records as possible
beforedoing the gps_distance() function call<br />        -- It works well as a outer bounds to get the most obvious
outsidezip_city values. <br />        abs(zip_city.longitude - my_location.longitude) <= (50.0 / 60.0)<br />       
ANDabs(zip_city.latitude - my_location.latitude) <= (50.0 / 60.0)<br />    )<br /><br />This runs in about 2 seconds
onWindows and gives me 650 records - not bad. <br /><br />When I add just a little bit:<br /><br />    select<br />   
   zip_city.zip_id,<br />        gps_distance(zip_city.longitude::numeric, zip_city.latitude::numeric,
my_location.longitude::numeric,my_location.latitude::numeric) AS distance <br />    from zip_city<br />   
-->>>>THE ONLY DIFFERENCE IS THE NEXT 3 LINES <<<<--<br />    inner join zip on (<br />       
zip_city.zip_id= zip.zip_id<br />    )<br />    -->>>> End of added code <<<<-- <br />    inner
join(<br />        select<br />            longitude,<br />            latitude<br />        from zip<br />       
innerjoin zip_city on (<br />            zip.zip_id = zip_city.zip_id<br />        )<br />        where zip_cd =
'64131'<br />    ) my_location on (<br />        abs(zip_city.longitude - my_location.longitude) <= (50.0 / 60.0)<br
/>       AND abs(zip_city.latitude - my_location.latitude) <= (50.0 / 60.0)<br />    )<br /><br />Performance tanks
withthis query - it takes over 120 seconds (that is where I set the timeout).  I read that a foreign key doesn't help
performanceand my guess is that PostgreSQL isn't just joining on the 650 records but rather on the over 800,000 records
inthe zip table (yes we have that many - we include Canada zip codes which is 90% of the data - Blame Canada! ;). <br
/><br/>Is there something I can do to boost performance?  I tried putting the first query above in an inner join before
joiningwith the zip table with the same result.<br /><br />BTW, on our Linux box the full query we run (which adds 3
moretables on the whole operation along with more filtering on the zip table) finishes in under 10 seconds.  Problem is
ourdevelopment is on Windows and this is a real pain for developers to test. <br /><br />Any assistance is
appreciated. I did look at the explain plans but I don't see anything other than what I mentioned above.  I can include
thoseif necessary.<br /><br />Thanks!<br />Aaron<br clear="all" /><br />-- <br />
==================================================================<br/>   Aaron Bono<br />   Aranya Software
Technologies,Inc.<br />   <a href="http://www.aranya.com">http://www.aranya.com</a><br />   <a
href="http://codeelixir.com">http://codeelixir.com</a><br
/>================================================================== 

pgsql-sql by date:

Previous
From: Richard Albright
Date:
Subject: plpgsql array looping
Next
From: saji varghese
Date:
Subject: Re: hi