Re: How to find double entries - Mailing list pgsql-sql

From Tom Lane
Subject Re: How to find double entries
Date
Msg-id 21481.1208316212@sss.pgh.pa.us
Whole thread Raw
In response to How to find double entries  (Andreas <maps.on@gmx.net>)
Responses Re: How to find double entries  (Vivek Khera <vivek@khera.org>)
List pgsql-sql
Andreas <maps.on@gmx.net> writes:
> I'd like to identify and then merge records of e.g.   'google', 'gogle', 
> 'guugle' 

> Then I want to match abbrevations like  'A-Company Ltd.', 'a company 
> ltd.', 'A-Company Limited'

> Is there a way to do this?
> It would be OK just to list candidats up to be manually checked afterwards.

There are some functions in contrib/fuzzystrmatch that seem like they'd
help you find candidate duplicates.  contrib/pg_trgm and text search
might also offer promising tools.

What's really a duplicate sounds like a judgment call here, so you
probably shouldn't even think of automating it completely.
        regards, tom lane


pgsql-sql by date:

Previous
From: Andreas
Date:
Subject: How to find double entries
Next
From: Craig Ringer
Date:
Subject: Re: How to find double entries