Re: Need magic for identifieing double adresses - Mailing list pgsql-general

From Gary Chambers
Subject Re: Need magic for identifieing double adresses
Date
Msg-id AANLkTi=jQ+sgxu=VHJft__PZrwykDL4LvEKieQAn2wak@mail.gmail.com
Whole thread Raw
In response to Need magic for identifieing double adresses  (Andreas <maps.on@gmx.net>)
List pgsql-general
Andreas,

> Relevant fields could be  name, street, zip, city, phone
> Is there a way to do something like this with postgresql ?
> I fear this will need still a lot of manual sorting and searching even when
> potential peers get automatically identified.

One of the techniques I use to increase the odds of detecting
duplicates is to trim each column, remove all internal whitespace,
coalesce it into a single string, and calculate an MD5 (some other
hash function may be better) hash.  It's not perfect (we are dealing
with humans, after all), but it helps.

-- Gary Chambers

/* Nothing fancy and nothing Microsoft! */

pgsql-general by date:

Previous
From: Darren Duncan
Date:
Subject: Re: Need magic for identifieing double adresses
Next
From: Peter Roethlisberger
Date:
Subject: libssl issue ?