Thread: Name Alike Challenge

Name Alike Challenge

From
"Josh Berkus"
Date:
Folks,

The "Name Alike" PL/pgSQL function has been posted on Roberto Mello's
cookbook:

http://www.brasileiro.net/postgres/cookbook/view-one-recipe.adp?recipe_id=967

This function requires Joe Conway's port of the Metaphone and
Levenshtein functions to PostgreSQL, available from /contrib on CVS as
well as another site where /contrib stuff is available (anyone know
where this is?).  Thanks again, Joe!

The purpose of the function is to take two First/Last name sets, and
depending on the desired degree of "fuzzyness" return whether they are
similar or not.  I constructed this function with two purposes in mind:
preventing duplicates by checking for similar names before accepting
INSERTS, and de-duplicating large, messy lists from external souces
(such as box office lists).

Now, the challenge:

1. Aside from stripping the annoying copyright comments, can anyone name
a way in which this function could be made more efficient without
changing its results?

2. Can anyone come up with roman-alphabet names which will "defeat" the
function?  I.e. can you think of similar-typoed names that can't be
detected as similar, or patently different names that come up as
identical?

3. Can anyone turn this function on its head, and rather than having it
accept 2 first/last names and a looseness factor and return TRUE/FALSE,
have it accept the two name pairs and return a looseness factor?

Go for it!

-Josh Berkus



______AGLIO DATABASE SOLUTIONS___________________________                                      Josh Berkus Complete
informationtechnology      josh@agliodbs.com  and data management solutions       (415) 565-7293 for law firms, small
businesses       fax 621-2533   and non-profit organizations.      San Francisco
 


Re: Name Alike Challenge

From
"Joe Conway"
Date:
> The "Name Alike" PL/pgSQL function has been posted on Roberto Mello's
> cookbook:
>
>
http://www.brasileiro.net/postgres/cookbook/view-one-recipe.adp?recipe_id=96
7
>
> This function requires Joe Conway's port of the Metaphone and
> Levenshtein functions to PostgreSQL, available from /contrib on CVS as
> well as another site where /contrib stuff is available (anyone know
> where this is?).  Thanks again, Joe!

I sent it as a tgz to Justin for techdocs.postgresql.org. Here's the link:
http://techdocs.postgresql.org/source.php#ffuzzy

Note that the lastest source in cvs has soundex included, which this tar
file does not, but soundex is not needed for Josh's function and it was in
previous contribs anyway.

-- Joe