Re: How to find double entries - Mailing list pgsql-sql

From Craig Ringer
Subject Re: How to find double entries
Date
Msg-id 48062003.3050409@postnewspapers.com.au
Whole thread Raw
In response to Re: How to find double entries  (Vivek Khera <vivek@khera.org>)
List pgsql-sql
Vivek Khera wrote:
> 
> On Apr 15, 2008, at 11:23 PM, Tom Lane wrote:
>> What's really a duplicate sounds like a judgment call here, so you
>> probably shouldn't even think of automating it completely.
> 
> I did a consulting gig about 10 years ago for a company that made
> software to normalize street addresses and names.  Literally dozens of
> people worked there, and that was their primary software product.  It is
> definitely not a trivial task, as the rules can be extremely complex.

From what little I've personally seen of others' addressing handling,
some (many/most?) people who blindly advocate full normalisation of
addresses either:

(a) only care about a rather restricted set of address types ("ordinary
residential addresses in <my country>", though that can be bad enough);
or
(b) don't know how horrible addressing is .... yet ... and are going to
find out soon when their highly normalized addressing schema proves
incapable of representing some address they've just been presented with.

with most probably falling into the second category.

Overly strict addressing, without the associated fairly extreme
development effort to get it even vaguely right, seems to lead to users
working around the broken addressing schema by entering bogus data.


Personally I'm content to provide lots of space for user-formatted
addresses, only breaking out separate fields for the post code
(Australian only), the city/suburb, the state, and the country - all
stored as strings. The only DB level validation is a rule preventing the
entry of invalid & undefined postcodes for Australian addresses, and
preventing the entry of invalid Australian states. The app is used
almost entirely with Australian addresses, and there's a definitive, up
to date list of australian post codes available from the postal
services, so it's worth a little more checking to protect against basic
typos and misunderstandings.

The app provides some more help at the UI level for users, such as
automatically filling in the state and suburb if an Australian post code
is entered. It'll warn you if you enter an unknown Australian
suburb/city for an entry in Australia. For everything else I leave it to
the user and to possible later validation and reporting.

I've had good results with this policy when working with other apps that
need to handle addressing information, and I've had some truly horrible
experiences with apps that try to be too strict in their address checking.

--
Craig Ringer


pgsql-sql by date:

Previous
From: Vivek Khera
Date:
Subject: Re: How to find double entries
Next
From: Osvaldo Rosario Kussama
Date:
Subject: Re: Data Comparison Single Table Question