After three years of working with MySQL, I've landed a project that
seems perfect for PosgreSQL. However, I'm having some performance
problems.
I have a table with the following stats:
Column | Type | Modifiers
-----------------------------+-----------------------+-----------
state_code | smallint | not null
zip | integer | not null
plus_4 | smallint | not null
delivery_point_code | character(3) | not null
title_code | character(1) | not null
first_name | character varying(14) | not null
middle_initial | character(1) | not null
last_name | character varying(30) | not null
name_suffix | character(1) | not null
house_number | character varying(11) | not null
fract | character(3) | not null
street_prefix_dir | character(2) | not null
street_name | character varying(28) | not null
street_suffix | character(4) | not null
street_post_dir | character(2) | not null
route_designator_and_number | character(6) | not null
box_designator_and_number | character varying(15) | not null
apt_designator_and_number | character varying(15) | not null
city | character varying(13) | not null
state | character(2) | not null
carrier_route | character(4) | not null
complete_telephone | character varying(10) | not null
latitude | numeric(15,10) | not null
longitude | numeric(15,10) | not null
address_quality_code | character(1) | not null
Indexes: addr_lat btree (latitude),
addr_long btree (longitude),
addr_zip btree (zip)
This table and it's companion have over 102 million rows.
Essentially, I want to use longitude and latitude to find properties
that are nearest to a provided property. So, using PHP, this is what I
do:
Customer enters an address and, using the zip code, house number and
street name, among other things, we locate the longitude and latitude
of the property. This is very fast and usually returns a result
instantly. The zip code field is indexed.
Once we have the longitude and latitude for the property, we use the
following select to determine what properties are near the target
property.
SELECT (first_name || ' ' || last_name) AS name,
(house_number || ' ' || fract::varchar || ' ' ||
street_prefix_dir::varchar || ' ' || street_name || ' ' ||
street_suffix::varchar || ' ' || street_post_dir::varchar) AS address,
city,
state,
zip
FROM address
WHERE latitude >= $minLat AND
latitude <= $maxLat AND
longitude >= $minLong AND
longitude <= $maxLong
ORDER BY ((abs($property_long - longitude)) * 1000) +
((abs($property_lat - latitude)) * 1000)
LIMIT 100
This is where things get very slow. If I search for a property in Fort
Worth it can take between two and three minutes to return a result.
The second time I search, however, it takes a more reasonable 10 secs
to return a result. If I decide, though, that I want to search in
California next, the three minute wait occurs again.
From reading the documentation, I've discovered that the reason for the
speedy returns the second time around is because the results are
cached. Since we'll have users all over the nation hitting this
solution when we go live, the initial query needs to be as fast as it
may be. Here's what I've tried:
1) indexed latitude and longitude fields separately, which gave me my
initial performance boost (was taking 45 minutes before the indexes
were applied! :))
2) raised shared_buffers to 128 and saw a little more increase in
performance
3) raised sort_mem to 16MB, and got another small boost in speed
But the initial search is still takes 3 minutes.
Here is an explain analyze on a sample address:
QUERY
PLAN
------------------------------------------------------------------------
----------------------------------------------------------------------
Limit (cost=219538.31..219538.48 rows=69 width=114) (actual
time=97750.62..97751.23 rows=100 loops=1)
-> Sort (cost=219538.31..219538.48 rows=70 width=114) (actual
time=97750.60..97750.80 rows=101 loops=1)
Sort Key: ((abs((-77.5527420000 - longitude)) * 1000::numeric)
+ (abs((43.1790237400 - latitude)) * 1000::numeric))
-> Index Scan using addr_long on address
(cost=0.00..219536.18 rows=70 width=114) (actual time=184.66..97686.35
rows=2464 loops=1)
Index Cond: ((longitude >= -77.562653289755) AND
(longitude <= -77.542830710245))
Filter: ((latitude >= 43.171795381705) AND (latitude <=
43.186252098295))
Total runtime: 97752.32 msec
I've exhausted just about everything I know to do in order to get this
to speed up. Not sure where to go next. If anyone has any
suggestions, I would love to hear 'em!
Thanks,
Charley