Thread: Search speed issues

Search speed issues

From
Charley L.Tiggs
Date:
After three years of working with MySQL, I've landed a project that
seems perfect for PosgreSQL.  However, I'm having some performance
problems.

I have a table with the following stats:

            Column            |         Type          | Modifiers
-----------------------------+-----------------------+-----------
  state_code                  | smallint              | not null
  zip                         | integer               | not null
  plus_4                      | smallint              | not null
  delivery_point_code         | character(3)          | not null
  title_code                  | character(1)          | not null
  first_name                  | character varying(14) | not null
  middle_initial              | character(1)          | not null
  last_name                   | character varying(30) | not null
  name_suffix                 | character(1)          | not null
  house_number                | character varying(11) | not null
  fract                       | character(3)          | not null
  street_prefix_dir           | character(2)          | not null
  street_name                 | character varying(28) | not null
  street_suffix               | character(4)          | not null
  street_post_dir             | character(2)          | not null
  route_designator_and_number | character(6)          | not null
  box_designator_and_number   | character varying(15) | not null
  apt_designator_and_number   | character varying(15) | not null
  city                        | character varying(13) | not null
  state                       | character(2)          | not null
  carrier_route               | character(4)          | not null
  complete_telephone          | character varying(10) | not null
  latitude                    | numeric(15,10)        | not null
  longitude                   | numeric(15,10)        | not null
  address_quality_code        | character(1)          | not null
Indexes: addr_lat btree (latitude),
          addr_long btree (longitude),
          addr_zip btree (zip)

This table and it's companion have over 102 million rows.

Essentially, I want to use longitude and latitude to find properties
that are nearest to a provided property.  So, using PHP, this is what I
do:

Customer enters an address and, using the zip code, house number and
street name, among other things, we locate the longitude and latitude
of the property.  This is very fast and usually returns a result
instantly.  The zip code field is indexed.

Once we have the longitude and latitude for the property, we use the
following select to determine what properties are near the target
property.

SELECT (first_name || ' ' || last_name) AS name,
    (house_number || ' ' || fract::varchar || ' ' ||
street_prefix_dir::varchar || ' ' || street_name || ' ' ||
street_suffix::varchar || ' ' || street_post_dir::varchar) AS address,
    city,
    state,
    zip
FROM address
WHERE latitude >= $minLat AND
    latitude <= $maxLat AND
    longitude >= $minLong AND
    longitude <= $maxLong
ORDER BY ((abs($property_long - longitude)) * 1000) +
((abs($property_lat - latitude)) * 1000)
LIMIT 100

This is where things get very slow.  If I search for a property in Fort
Worth it can take between two and three minutes to return a result.
The second time I search, however, it takes a more reasonable 10 secs
to return a result.  If I decide, though, that I want to search in
California next, the three minute wait occurs again.

 From reading the documentation, I've discovered that the reason for the
speedy returns the second time around is because the results are
cached.  Since we'll have users all over the nation hitting this
solution when we go live, the initial query needs to be as fast as it
may be.  Here's what I've tried:

1) indexed latitude and longitude fields separately, which gave me my
initial performance boost (was taking 45 minutes before the indexes
were applied!  :))
2) raised shared_buffers to 128 and saw a little more increase in
performance
3) raised sort_mem to 16MB, and got another small boost in speed

But the initial search is still takes 3 minutes.

Here is an explain analyze on a sample address:

                                                                   QUERY
PLAN
------------------------------------------------------------------------
----------------------------------------------------------------------
  Limit  (cost=219538.31..219538.48 rows=69 width=114) (actual
time=97750.62..97751.23 rows=100 loops=1)
    ->  Sort  (cost=219538.31..219538.48 rows=70 width=114) (actual
time=97750.60..97750.80 rows=101 loops=1)
          Sort Key: ((abs((-77.5527420000 - longitude)) * 1000::numeric)
+ (abs((43.1790237400 - latitude)) * 1000::numeric))
          ->  Index Scan using addr_long on address
(cost=0.00..219536.18 rows=70 width=114) (actual time=184.66..97686.35
rows=2464 loops=1)
                Index Cond: ((longitude >= -77.562653289755) AND
(longitude <= -77.542830710245))
                Filter: ((latitude >= 43.171795381705) AND (latitude <=
43.186252098295))
  Total runtime: 97752.32 msec

I've exhausted just about everything I know to do in order to get this
to speed up.  Not sure where to go next.  If anyone has any
suggestions, I would love to hear 'em!

Thanks,
Charley


Re: Search speed issues

From
Tom Lane
Date:
"Charley L.Tiggs" <ctiggs@xpressdocs.com> writes:
> SELECT (first_name || ' ' || last_name) AS name,
>     (house_number || ' ' || fract::varchar || ' ' ||
> street_prefix_dir::varchar || ' ' || street_name || ' ' ||
> street_suffix::varchar || ' ' || street_post_dir::varchar) AS address,
>     city,
>     state,
>     zip
> FROM address
> WHERE latitude >= $minLat AND
>     latitude <= $maxLat AND
>     longitude >= $minLong AND
>     longitude <= $maxLong
> ORDER BY ((abs($property_long - longitude)) * 1000) +
> ((abs($property_lat - latitude)) * 1000)
> LIMIT 100

Seems like a two-column index on (latitude, longitude) is what you need.

You could get fancy and try mucking about with box overlap operators and
rtree indexes, but I suspect the 2-column btree will work well enough.

            regards, tom lane