Thread: Postal code radius searches

Postal code radius searches

From
Milo Hyson
Date:
I've been struggling with this problem for a while now and I can't seem to
find a solution. I have a postal-code database, currently populated with over
76,000 United States ZIP codes. Each record contains, among other things, the
latitude and longitude for the postal code. I have a stored procedure that
calculates the distance between any two points on the globe. I'm trying to
figure out a fast way to locate all of the postal codes within an arbitrary
radius of another postal code.

The brute force method requires a sequential scan of all 76,000 records
looking for those that fall within the specified area. A more
high-performance method would be to pre-calculate the distances between all
postal codes (possibly limiting the distance to save space). However, this
requires more than 76,000 ^ 2 database operations. On a 1 GHz box, I
calculated this would take nearly one year complete. It would take twice as
long if I wanted to create a second cache for city/state searches.

Does anybody have and tips on solving this issue? Is there any sort of
complex index I could create based on the results of an arbitrary stored
procedure call? Maybe some custom C code?

--
Milo Hyson
CyberLife Labs, LLC

Re: Postal code radius searches

From
Vince Vielhaber
Date:
On Wed, 6 Feb 2002, Milo Hyson wrote:

> I've been struggling with this problem for a while now and I can't seem to
> find a solution. I have a postal-code database, currently populated with over
> 76,000 United States ZIP codes. Each record contains, among other things, the
> latitude and longitude for the postal code. I have a stored procedure that
> calculates the distance between any two points on the globe. I'm trying to
> figure out a fast way to locate all of the postal codes within an arbitrary
> radius of another postal code.

I do something similar.  Take the lon/lat of the center point and add an
arbitrary amount to the lon/lat that would put me over the desired
distance and have my select call return all of those records.  Then
I calculate the distances only with those returned.  I don't have the
database do the final calculation, that's done in a C routine because
I do a few other site specific things with the info first.  The database
contains locations of campgrounds.  If you use this method, remember
the farther south you go in the us the larger your arbitrary number
needs to be.

Vince.
--
==========================================================================
Vince Vielhaber -- KA8CSH    email: vev@michvhf.com    http://www.pop4.net
         56K Nationwide Dialup from $16.00/mo at Pop4 Networking
        Online Campground Directory    http://www.camping-usa.com
       Online Giftshop Superstore    http://www.cloudninegifts.com
==========================================================================




Re: Postal code radius searches

From
Elaine Lindelef
Date:
>I've been struggling with this problem for a while now and I can't seem to
>find a solution. I have a postal-code database, currently populated with over
>76,000 United States ZIP codes. Each record contains, among other things, the
>latitude and longitude for the postal code. I have a stored procedure that
>calculates the distance between any two points on the globe. I'm trying to
>figure out a fast way to locate all of the postal codes within an arbitrary
>radius of another postal code.
>
>The brute force method requires a sequential scan of all 76,000 records
>looking for those that fall within the specified area. A more
>high-performance method would be to pre-calculate the distances between all
>postal codes (possibly limiting the distance to save space). However, this
>requires more than 76,000 ^ 2 database operations. On a 1 GHz box, I
>calculated this would take nearly one year complete. It would take twice as
>long if I wanted to create a second cache for city/state searches.
>
>Does anybody have and tips on solving this issue? Is there any sort of
>complex index I could create based on the results of an arbitrary stored
>procedure call? Maybe some custom C code?
>
>--
>Milo Hyson
>CyberLife Labs, LLC
>

Depending upon your application, an alternate way to do the
precalculation is to set discrete radii - say 1 mile, 5 miles, 10
miles - and then calculate for each zip code what other zip codes are
within that radius. It may not be faster to calculate, but it is
better for retrieval speed. If you are only interested in smaller
radii, it will be faster because you will not have to compute
distances between zip codes in CA and zip codes in NY.

However, you must assume with this method that your customer will not
change his mind regularly about what radii should be available in the
interface. ;-)

Elaine Lindelef
Cognitivity


Re: Postal code radius searches

From
"Roderick A. Anderson"
Date:
On Wed, 6 Feb 2002, Milo Hyson wrote:

> I've been struggling with this problem for a while now and I can't seem to
> find a solution. I have a postal-code database, currently populated with over
> 76,000 United States ZIP codes. Each record contains, among other things, the
> latitude and longitude for the postal code. I have a stored procedure that
> calculates the distance between any two points on the globe. I'm trying to
> figure out a fast way to locate all of the postal codes within an arbitrary
> radius of another postal code.

Have you looked at PostGIS?  I'm a couple of revisions behind but I seem
to remember something about distances from points etc.  The down side is
it is only good on PG 7.1.3 and an upgrade to 7.2 won't be happening for
awhile as the developers are off making money (working) right now.


Best,
Rod
--
                      Let Accuracy Triumph Over Victory

                                                       Zetetic Institute
                                                        "David's Sling"
                                                         Marc Stiegler


Re: Postal code radius searches

From
wsheldah@lexmark.com
Date:

I'm sure that some more erudite answers will be forthcoming, but there's one
obvious improvement to the brute-force approach.

Don't calculate exact differences when an estimate will do. If zip codes are as
consistent as I think they are, from low numbers on the east coast to high
numbers on the west coast, you should be able to look at the first two or three
digits of both zip codes and tell whether they're anywhere near each other. I
would expect zip codes '11111' and '99999' to be at least more than 1000 miles
apart, and probably more than that. This is just for a rough estimate though.
How useful it is will partly depend on how big a radius you're typically
searching for.

I also wonder if you couldn't convert degrees longitude difference between two
points to an approximate number of miles, add on some miles for safety, and
restrict your search a bit that way. Do the same for degrees lattitude
difference. That way you restrict the records you have to search before you ever
start using that stored procedure to calculate exact differences.  If you're
looking for 500 miles from Houston, don' t worry about states on the east or
west coast, or next to canada, and probably a couple of states further in on
most sides. Once you get within that rough estimate, go ahead and calculate
exact differences. Though you should be able to assume neighboring zip codes
will be within range as well, without calculating them. If I'm looking for a 500
mile radius, zip codes that are only different in the last digit, or are within
a certain number of degrees lattitude and longitude, will match for sure, and
their exact distance doesn't need to be calculated. So you really only need to
calculate distances for a donut shaped region around the base postal code.

You might look at postgresql's geometric functions, if you haven't already. I
don't know whether an R-tree index would be useful, or even whether it's still
supported? Haven't actually worked with either of these, haven't needed to. Good
luck!

Wes Sheldahl



Milo Hyson <milo%cyberlifelabs.com@interlock.lexmark.com> on 02/06/2002 01:36:18
PM

To:   pgsql-general%postgresql.org@interlock.lexmark.com
cc:    (bcc: Wesley Sheldahl/Lex/Lexmark)
Subject:  [GENERAL] Postal code radius searches


I've been struggling with this problem for a while now and I can't seem to
find a solution. I have a postal-code database, currently populated with over
76,000 United States ZIP codes. Each record contains, among other things, the
latitude and longitude for the postal code. I have a stored procedure that
calculates the distance between any two points on the globe. I'm trying to
figure out a fast way to locate all of the postal codes within an arbitrary
radius of another postal code.

The brute force method requires a sequential scan of all 76,000 records
looking for those that fall within the specified area. A more
high-performance method would be to pre-calculate the distances between all
postal codes (possibly limiting the distance to save space). However, this
requires more than 76,000 ^ 2 database operations. On a 1 GHz box, I
calculated this would take nearly one year complete. It would take twice as
long if I wanted to create a second cache for city/state searches.

Does anybody have and tips on solving this issue? Is there any sort of
complex index I could create based on the results of an arbitrary stored
procedure call? Maybe some custom C code?

--
Milo Hyson
CyberLife Labs, LLC

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)





Re: Postal code radius searches

From
postgresql@fruru.com
Date:
On Wed, 6 Feb 2002, Milo Hyson wrote:

> figure out a fast way to locate all of the postal codes within an arbitrary
> radius of another postal code.

perhaps a "refined" brute force is ok :

1. take the point you want to calculate distances around

2. calculate the maximum and minimum latitude/longitude so that a city can
be nearer than your distance limit (equivalent to going x km/mi north,
south, east and west)

3. do a brute force search but limit yourself to the "square" (it's not
really square ;-) around your starting point.

If the distances are normally small, this should be able to use indexes on
the coordinates and probably be a lot faster already.

Hope this helps,
Tycho

--
Tycho Fruru            tycho.fruru@conostix.com



Re: Postal code radius searches

From
"P.J. \"Josh\" Rovero"
Date:
I've been doing something similar with a database with about
2 million aircraft positions.  Create an index with latitude,
longitude, and zipcode for starters.

Simpler math helps -- for example all zip codes within a
certain (whole or fractional) degrees lat and long of
another zip is an easy and fast calculation, as it's just
addition/subtraction.  You get a cell or box around the
center zip.

Distance on the surface of the earth (the radius about the zip)
takes trig functions and mult/division, and usually takes longer.

For some purposes the simpler solution may work, for others you
may have to do the math.

Milo Hyson wrote:
> I've been struggling with this problem for a while now and I can't seem to
> find a solution. I have a postal-code database, currently populated with over
> 76,000 United States ZIP codes. Each record contains, among other things, the
> latitude and longitude for the postal code. I have a stored procedure that
> calculates the distance between any two points on the globe. I'm trying to
> figure out a fast way to locate all of the postal codes within an arbitrary
> radius of another postal code.
>
> The brute force method requires a sequential scan of all 76,000 records
> looking for those that fall within the specified area. A more
> high-performance method would be to pre-calculate the distances between all
> postal codes (possibly limiting the distance to save space). However, this
> requires more than 76,000 ^ 2 database operations. On a 1 GHz box, I
> calculated this would take nearly one year complete. It would take twice as
> long if I wanted to create a second cache for city/state searches.
>
> Does anybody have and tips on solving this issue? Is there any sort of
> complex index I could create based on the results of an arbitrary stored
> procedure call? Maybe some custom C code?
>
>


--
P. J. "Josh" Rovero                                 Sonalysts, Inc.
Email: rovero@sonalysts.com    www.sonalysts.com    215 Parkway North
Work: (860)326-3671 or 442-4355                     Waterford CT 06385
***********************************************************************


Re: Postal code radius searches

From
"Sykora, Dale"
Date:
You could expand on this solution by including points where x&y are +&-
.707*r and calulating points between .707r and r.  Basically draw a
square inside the radius circle and outside it.  Points between these
squares need calculating, points withing both squares do not.  This will
change slightly as x & y are not linear so you are dealing with
rectangles/ellipses instead of circle/squares.

> -----Original Message-----
> From: postgresql@fruru.com [mailto:postgresql@fruru.com]
> Sent: Wednesday, February 06, 2002 1:30 PM
> To: Milo Hyson
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Postal code radius searches
>
>
> On Wed, 6 Feb 2002, Milo Hyson wrote:
>
> > figure out a fast way to locate all of the postal codes
> within an arbitrary
> > radius of another postal code.
>
> perhaps a "refined" brute force is ok :
>
> 1. take the point you want to calculate distances around
>
> 2. calculate the maximum and minimum latitude/longitude so
> that a city can
> be nearer than your distance limit (equivalent to going x km/mi north,
> south, east and west)
>
> 3. do a brute force search but limit yourself to the "square"
> (it's not
> really square ;-) around your starting point.
>
> If the distances are normally small, this should be able to
> use indexes on
> the coordinates and probably be a lot faster already.
>
> Hope this helps,
> Tycho
>
> --
> Tycho Fruru            tycho.fruru@conostix.com
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo@postgresql.org
>

Re: Postal code radius searches

From
Tom Lane
Date:
Milo Hyson <milo@cyberlifelabs.com> writes:
> I've been struggling with this problem for a while now and I can't seem to
> find a solution. I have a postal-code database, currently populated
> with over 76,000 United States ZIP codes. Each record contains, among
> other things, the latitude and longitude for the postal code.

[ some overlap here with previous answers, but some new stuff too ]

As some other people already pointed out, PostGIS probably has a direct
solution for this.  However, you could solve it without PostGIS using
rtree indexes.

Here's an example that shows how to find all the points contained within
a given bounding box using an rtree index.  For some reason there is not
an rtree opclass for "point"; but there is one for "box", so we promote
the points into boxes of width and height zero.

regression=# create table pts (f1 int, f2 point);
CREATE
regression=# create index ptsi on pts using rtree(box(f2,f2));
CREATE
regression=# insert into pts values (1, '0,0');
INSERT 147648 1
regression=# insert into pts values (2, '1,1');
INSERT 147649 1
regression=# insert into pts values (3, '2,1');
INSERT 147650 1
regression=# insert into pts values (4, '12,1');
INSERT 147651 1
-- now find f2 points contained in the bounding box (1,0),(2,2)
regression=# select * from pts where box(f2,f2) @ '1,0,2,2'::box;
 f1 |  f2
----+-------
  2 | (1,1)
  3 | (2,1)
(2 rows)

regression=# explain select * from pts where box(f2,f2) @ '1,0,2,2'::box;
NOTICE:  QUERY PLAN:

Index Scan using ptsi on pts  (cost=0.00..4.83 rows=1 width=20)

EXPLAIN

So, given an index constructed this way, you could compute the minimum
and maximum latitude and longitude that a point could have and still
fall within the desired distance of your start point.  Then use the
index to pull out the points within that "box", and finally do the
expensive exact-distance calculation for just these points.

            regards, tom lane

Re: Postal code radius searches

From
Milo Hyson
Date:
Thanks to all who offered support. After looking at the Berkeley paper on
R-trees, I've determined that it's exactly what I'm looking for. In fact, it
has been suggested that I contribute to PostgreSQL by modifying rtree_gist to
support circles. I've decided to not only do this, but also try to include
support for spherical sections. This will enable exact radius searches
anywhere on a sphere (e.g. a distance around London that includes Tokyo). The
project I'm working on has desires to go international, so this will
ultimately be useful to us.

--
Milo Hyson
CyberLife Labs, LLC

Re: Postal code radius searches

From
will trillich
Date:
On Wed, Feb 06, 2002 at 02:47:14PM -0500, P.J. Josh Rovero wrote:
> Simpler math helps -- for example all zip codes within a
> certain (whole or fractional) degrees lat and long of
> another zip is an easy and fast calculation, as it's just
> addition/subtraction.  You get a cell or box around the
> center zip.

isn't it a bit more complicated than that?

for example, chicago is where you'll find 60606 but gary
indiana, barely 30 miles away, is 46407. zipcodes are
one-dimensional (00000 to 99999) but the geography is
two-dimensional (north/south, east/west on a mercator-projection
map) or three-dimensional (latitude, longitude[, radius] on the
surface of an oblate spheroid).

--
DEBIAN NEWBIE TIP #61 from Hamma Scott <scott_hamma@yahoo.com>
:
Ever have troubles with EITHER X OR CONSOLE LOCKUP?  If your
session is hung you can type <CTRL><ALT>F2-F6 to get to another
login session.  This way, you can shut your machine down
properly, or kill whichever process is causing trouble (use "ps
axf" to see them all).

Also see http://newbieDoc.sourceForge.net/ ...

Re: Postal code radius searches

From
Josh Rovero
Date:
Please note my caveats.  If delta-latitude and delta-longitude
(i.e, a "box" around the center lat-long rather than a circle
of specific radius) is *acceptable for your applications*,
the math can be much simpler.  Think of the lat/long grids
plotted on many maps -- they are "boxes" of delta-lat delta-lon.

There are many applications for which "coloring" or counting
what's in the squares is sufficient.  And once the difference
in lat/long is determined by simple math, you can use the
more costly trig to get more accurate distances.

I did some air traffic density analyses (with postgresql) using
lat/long cells and calculated radius about center points.  The
former is much faster than the latter.

will trillich wrote:

>isn't it a bit more complicated than that?
>
>for example, chicago is where you'll find 60606 but gary
>indiana, barely 30 miles away, is 46407. zipcodes are
>one-dimensional (00000 to 99999) but the geography is
>two-dimensional (north/south, east/west on a mercator-projection
>map) or three-dimensional (latitude, longitude[, radius] on the
>surface of an oblate spheroid).
>


Re: Postal code radius searches

From
pgsql-gen Newsgroup (@Basebeans.com)
Date:
Subject: Subject: Re: [GENERAL] Postal code radius searches
2167:From: Vic Cekvenich <Vic@baseBeans.com>
 ===