Search speed issues - Mailing list pgsql-novice
From | Charley L.Tiggs |
---|---|
Subject | Search speed issues |
Date | |
Msg-id | C5A562F0-7CFF-11D7-8E4B-000393D88A3A@xpressdocs.com Whole thread Raw |
Responses |
Re: Search speed issues
|
List | pgsql-novice |
After three years of working with MySQL, I've landed a project that seems perfect for PosgreSQL. However, I'm having some performance problems. I have a table with the following stats: Column | Type | Modifiers -----------------------------+-----------------------+----------- state_code | smallint | not null zip | integer | not null plus_4 | smallint | not null delivery_point_code | character(3) | not null title_code | character(1) | not null first_name | character varying(14) | not null middle_initial | character(1) | not null last_name | character varying(30) | not null name_suffix | character(1) | not null house_number | character varying(11) | not null fract | character(3) | not null street_prefix_dir | character(2) | not null street_name | character varying(28) | not null street_suffix | character(4) | not null street_post_dir | character(2) | not null route_designator_and_number | character(6) | not null box_designator_and_number | character varying(15) | not null apt_designator_and_number | character varying(15) | not null city | character varying(13) | not null state | character(2) | not null carrier_route | character(4) | not null complete_telephone | character varying(10) | not null latitude | numeric(15,10) | not null longitude | numeric(15,10) | not null address_quality_code | character(1) | not null Indexes: addr_lat btree (latitude), addr_long btree (longitude), addr_zip btree (zip) This table and it's companion have over 102 million rows. Essentially, I want to use longitude and latitude to find properties that are nearest to a provided property. So, using PHP, this is what I do: Customer enters an address and, using the zip code, house number and street name, among other things, we locate the longitude and latitude of the property. This is very fast and usually returns a result instantly. The zip code field is indexed. Once we have the longitude and latitude for the property, we use the following select to determine what properties are near the target property. SELECT (first_name || ' ' || last_name) AS name, (house_number || ' ' || fract::varchar || ' ' || street_prefix_dir::varchar || ' ' || street_name || ' ' || street_suffix::varchar || ' ' || street_post_dir::varchar) AS address, city, state, zip FROM address WHERE latitude >= $minLat AND latitude <= $maxLat AND longitude >= $minLong AND longitude <= $maxLong ORDER BY ((abs($property_long - longitude)) * 1000) + ((abs($property_lat - latitude)) * 1000) LIMIT 100 This is where things get very slow. If I search for a property in Fort Worth it can take between two and three minutes to return a result. The second time I search, however, it takes a more reasonable 10 secs to return a result. If I decide, though, that I want to search in California next, the three minute wait occurs again. From reading the documentation, I've discovered that the reason for the speedy returns the second time around is because the results are cached. Since we'll have users all over the nation hitting this solution when we go live, the initial query needs to be as fast as it may be. Here's what I've tried: 1) indexed latitude and longitude fields separately, which gave me my initial performance boost (was taking 45 minutes before the indexes were applied! :)) 2) raised shared_buffers to 128 and saw a little more increase in performance 3) raised sort_mem to 16MB, and got another small boost in speed But the initial search is still takes 3 minutes. Here is an explain analyze on a sample address: QUERY PLAN ------------------------------------------------------------------------ ---------------------------------------------------------------------- Limit (cost=219538.31..219538.48 rows=69 width=114) (actual time=97750.62..97751.23 rows=100 loops=1) -> Sort (cost=219538.31..219538.48 rows=70 width=114) (actual time=97750.60..97750.80 rows=101 loops=1) Sort Key: ((abs((-77.5527420000 - longitude)) * 1000::numeric) + (abs((43.1790237400 - latitude)) * 1000::numeric)) -> Index Scan using addr_long on address (cost=0.00..219536.18 rows=70 width=114) (actual time=184.66..97686.35 rows=2464 loops=1) Index Cond: ((longitude >= -77.562653289755) AND (longitude <= -77.542830710245)) Filter: ((latitude >= 43.171795381705) AND (latitude <= 43.186252098295)) Total runtime: 97752.32 msec I've exhausted just about everything I know to do in order to get this to speed up. Not sure where to go next. If anyone has any suggestions, I would love to hear 'em! Thanks, Charley
pgsql-novice by date: