Re: Postal code radius searches - Mailing list pgsql-general
From | wsheldah@lexmark.com |
---|---|
Subject | Re: Postal code radius searches |
Date | |
Msg-id | 200202061919.OAA12374@interlock2.lexmark.com Whole thread Raw |
In response to | Postal code radius searches (Milo Hyson <milo@cyberlifelabs.com>) |
List | pgsql-general |
I'm sure that some more erudite answers will be forthcoming, but there's one obvious improvement to the brute-force approach. Don't calculate exact differences when an estimate will do. If zip codes are as consistent as I think they are, from low numbers on the east coast to high numbers on the west coast, you should be able to look at the first two or three digits of both zip codes and tell whether they're anywhere near each other. I would expect zip codes '11111' and '99999' to be at least more than 1000 miles apart, and probably more than that. This is just for a rough estimate though. How useful it is will partly depend on how big a radius you're typically searching for. I also wonder if you couldn't convert degrees longitude difference between two points to an approximate number of miles, add on some miles for safety, and restrict your search a bit that way. Do the same for degrees lattitude difference. That way you restrict the records you have to search before you ever start using that stored procedure to calculate exact differences. If you're looking for 500 miles from Houston, don' t worry about states on the east or west coast, or next to canada, and probably a couple of states further in on most sides. Once you get within that rough estimate, go ahead and calculate exact differences. Though you should be able to assume neighboring zip codes will be within range as well, without calculating them. If I'm looking for a 500 mile radius, zip codes that are only different in the last digit, or are within a certain number of degrees lattitude and longitude, will match for sure, and their exact distance doesn't need to be calculated. So you really only need to calculate distances for a donut shaped region around the base postal code. You might look at postgresql's geometric functions, if you haven't already. I don't know whether an R-tree index would be useful, or even whether it's still supported? Haven't actually worked with either of these, haven't needed to. Good luck! Wes Sheldahl Milo Hyson <milo%cyberlifelabs.com@interlock.lexmark.com> on 02/06/2002 01:36:18 PM To: pgsql-general%postgresql.org@interlock.lexmark.com cc: (bcc: Wesley Sheldahl/Lex/Lexmark) Subject: [GENERAL] Postal code radius searches I've been struggling with this problem for a while now and I can't seem to find a solution. I have a postal-code database, currently populated with over 76,000 United States ZIP codes. Each record contains, among other things, the latitude and longitude for the postal code. I have a stored procedure that calculates the distance between any two points on the globe. I'm trying to figure out a fast way to locate all of the postal codes within an arbitrary radius of another postal code. The brute force method requires a sequential scan of all 76,000 records looking for those that fall within the specified area. A more high-performance method would be to pre-calculate the distances between all postal codes (possibly limiting the distance to save space). However, this requires more than 76,000 ^ 2 database operations. On a 1 GHz box, I calculated this would take nearly one year complete. It would take twice as long if I wanted to create a second cache for city/state searches. Does anybody have and tips on solving this issue? Is there any sort of complex index I could create based on the results of an arbitrary stored procedure call? Maybe some custom C code? -- Milo Hyson CyberLife Labs, LLC ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
pgsql-general by date: