Re: optimizing a geo_distance() proximity query (example and benchmark) - Mailing list pgsql-performance

From Mark Stosberg
Subject Re: optimizing a geo_distance() proximity query (example and benchmark)
Date
Msg-id eq8d0b$26cd$1@news.hub.org
Whole thread Raw
In response to Re: optimizing a geo_distance() proximity query  ("Merlin Moncure" <mmoncure@gmail.com>)
Responses Re: optimizing a geo_distance() proximity query (example and benchmark)
Re: optimizing a geo_distance() proximity query (example and benchmark)
tip: faster sorting for proximity queries by using cube_distance()
List pgsql-performance
Merlin Moncure wrote:
> On 2/5/07, Mark Stosberg <mark@summersault.com> wrote:
>> Bruno Wolff III wrote:
>> > On Sat, Feb 03, 2007 at 14:00:26 -0500,
>> >   Mark Stosberg <mark@summersault.com> wrote:
>> >> I'm using geo_distance() from contrib/earthdistance would like to
>> find a
>> >> way to spend up the geo distance calculation if possible. This is
>> for a
>> >> proximity search: "Show me adoptable pets within 250 miles of this
>> >> zipcode".
>> >
>> > If you are using the "cube" based part of the earth distance package,
>> > then you can use gist indexes to speed those searches up.
>>
>> Thanks for the tip. Any idea what kind of improvement I can expect to
>> see, compared to using geo_distance()?
>
> a lot. be aware that gist takes longer to build than btree, but very
> fast to search.  Index search and filter to box is basically an index
> lookup (fast!). for mostly static datasets that involve a lot of
> searching, gist is ideal.

The documentation in contrib/ didn't provide examples of how to create
or the index or actually a the proximity search. Here's what I figured
out to do:

I added a new column as type 'cube':

 ALTER table zipcodes add column earth_coords cube;

Next I converted the old lat/lon data I had stored in a 'point'
type to the new format:

-- Make to get lat/lon in the right order for your data model!
 UPDATE zipcodes set earth_coords = ll_to_earth( lon_lat[1], lon_lat[0] );

Now I added a GIST index on the field:

 CREATE index earth_coords_idx on zipcodes using gist (earth_coords);

Finally, I was able to run a query, which I could see used the index (by
checking "EXPLAIN ANALYZE ..."

   select * from zipcodes where earth_box('(436198.322855334,
4878562.8732218, 4085386.43843934)'::cube,16093.44) @ earth_coords;

It's also notable that the units used are meters, not miles like
geo_distance(). That's what the magic number of "16093.44" is-- 10 miles
converted to meters.

When I benchmarked this query against the old geo_distance() variation,
it was about 200 times faster (~100ms vs .5ms).

However, my next step was to try a more "real world" query that involved
 a more complex where clause and a couple of table joins. So far, that
result is coming out /slower/ with the new approach, even though the
index is being used. I believe this may be cause of the additional
results found that are outside of the sphere, but inside the cube. This
causes additional rows that need processing in the joined tables.

Could someone post an example of how to further refine this so the
results more closely match what geo_distance returns() ?

Any other indexing or optimization tips would be appreciated.

   Mark

pgsql-performance by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: optimizing a geo_distance() proximity query
Next
From: "Mischa Sandberg"
Date:
Subject: Re: Tuning