Re: Fuzzy string matching of product names - Mailing list pgsql-general

From Bill Moran
Subject Re: Fuzzy string matching of product names
Date
Msg-id 20100405161818.a301e038.wmoran@potentialtech.com
Whole thread Raw
In response to Re: Fuzzy string matching of product names  (Peter Geoghegan <peter.geoghegan86@gmail.com>)
List pgsql-general
In response to Peter Geoghegan <peter.geoghegan86@gmail.com>:

> > http://www.postgresql.org/docs/8.4/static/fuzzystrmatch.html
> >
> > --
> > Bill Moran
> > http://www.potentialtech.com
> > http://people.collaborativefusion.com/~wmoran/
>
> Fuzzystrmatch is generally used to compare two single words for how
> similar they sound. How can that actually be applied to get the
> functionality that I've described?

Well, it really depends on your particular situation and what you
want to support.  You could break the name down into individual
words and generate metaphones, then use like to match on metaphone:

'The Candlestick Corporation, Limited' -> 'TE CDSK CPRN LMTD'

Searching for "candlestick" -> WHERE metaphone column like '%CDSK%'

Or you could create an array column that has all the metaphones in
it and use an ANY() or ALL() match to find ones that apply.

Exactly how you implement depends on how far you want to go.  Do you
want to support OR matches, AND matches, or both?  Can the words be
out of order?

You could also use Levenshtein as a percentage function to find matches,
even on long strings with multiple words.  Since Levenshtein gives you
the number of alterations between two strings, using that as a percentage
of the total string length gives you a pretty good gauge of how close
they are overall, and would allow you to set a threshold, or possibly even
list results by relevance.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

pgsql-general by date:

Previous
From: John R Pierce
Date:
Subject: Re: windows 7 compatiblity?
Next
From: Leif Biberg Kristensen
Date:
Subject: Re: Fuzzy string matching of product names