Re: application of KNN code to US zipcode searches? - Mailing list pgsql-performance

From Mark Stosberg
Subject Re: application of KNN code to US zipcode searches?
Date
Msg-id ijjj4g$c2k$1@dough.gmane.org
Whole thread Raw
In response to Re: application of KNN code to US zipcode searches?  (Mark Stosberg <mark@summersault.com>)
Responses Re: application of KNN code to US zipcode searches?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
I tried again to use KNN for a real-world query, and I was able to get
it to add an approximately 6x speed-up vs the cube search or
earthdistance methods ( from 300 ms to 50ms ).

I had to make some notable changes for the KNN index to be considered.

- Of course, I had to switch to using basic point/distance calculation.
  As previously noted, this still needs more work to confirm the
  accuracy and get the "distance" reported in miles.

- The query planner didn't like it when the "ORDER BY" referred to a
  column value instead of a static value, even when I believe it should
  know that the column value never changes. See this pseudo-query where
  we look-up the coordinates for 90210 once:

  EXPLAIN ANALYZE
  SELECT pets.pet_id,
      zipcodes.lon_lat <-> center.lon_lat AS radius
      FROM (SELECT lon_lat FROM zipcodes WHERE zipcode = '90210') AS
center, pets
      JOIN shelters USING (shelter_id)
      JOIN zipcodes USING (zipcode)
       ORDER BY postal_codes.lon_lat <-> center.lon_lat limit 1000;

  This didn't use the KNN index until I changed the "center.lon_lat" in
  the ORDER BY to an explicit point value. I'm not sure if that's
  expected, or something I should take up with -hackers.

  This could be worked around by doing a initial query to look-up this
  value, and then feed a static value into this query. That's not ideal,
  but the combination would still be faster.

- I had to drop the part of the WHERE clause which restricted the
  results to shelters within 50 miles from the target zipcode. However,
  I could set the "LIMIT" so high that I could get back "enough" pets,
  and then the application could trim out the results. Or, perhaps
  I could push this query down into a sub-select, and let PostgreSQL
  do a second pass to throw out some of the results.

In any case, with a real-world speed-up of 6x, this looks like it will
be worth it to us to continue to investigate.


pgsql-performance by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: application of KNN code to US zipcode searches?
Next
From: Tom Lane
Date:
Subject: Re: application of KNN code to US zipcode searches?