Removing duplicates - Mailing list pgsql-sql

From Matthew Hagerty
Subject Removing duplicates
Date
Msg-id 5.1.0.14.2.20020226095955.00b17f10@imap.brwholesale.com
Whole thread Raw
Responses Re: Removing duplicates  (Andrew Perrin <andrew_perrin@unc.edu>)
Re: Removing duplicates  (Jeff Self <jself@nngov.com>)
Re: Removing duplicates  ("Josh Berkus" <josh@agliodbs.com>)
Re: Removing duplicates  (Christof Glaser <gcg@gl.aser.de>)
List pgsql-sql
Greetings,

I have a customer database (name, address1, address2, city, state, zip) and 
I need a query (or two) that will give me a mailing list with the least 
amount of duplicates possible.  I know that precise matching is not 
possible, i.e. "P.O. Box 123" will never match "PO Box 123" without some 
data massaging, but if I can isolate even 50% of any duplicates, that would 
help greatly.

Also, any suggestions on which parameters to check the duplicates for?  My 
first thoughts were to make sure there were no two addresses the same in 
the same zip code.  Any insight (or examples) would be greatly appreciated.

Thank you,
Matthew



pgsql-sql by date:

Previous
From: Christopher Kings-Lynne
Date:
Subject: Re: Timestamp output
Next
From: Andrew Perrin
Date:
Subject: Re: Removing duplicates