Re: Fuzzy matching - Mailing list pgsql-patches

From Joe Conway
Subject Re: Fuzzy matching
Date
Msg-id 025b01c11c57$00280600$48d210ac@jecw2k1
Whole thread Raw
Responses Re: Re: Fuzzy matching
Re: Re: Fuzzy matching
Re: Re: Fuzzy matching
List pgsql-patches
> > Our usual practice with stuff of uncertain usefulness has been to
> > stick
> > it in contrib for awhile and see if anyone uses it.  If there's
> > sufficient interest, we'll promote it to mainstream in a future
> > release.
>
> Makes sense to me.  Go, Joe!
>

Per this discussion, here's a patch to implement both levenshtein() and
metaphone() in a contrib. There seem to be a fair number of different
approaches to both of these algorithms. I used the simplest case for
levenshtein which has a cost  of 1 for any character insertion, deletion, or
substitution. For metaphone, I adapted the same code from CPAN that the PHP
folks did.

A couple of questions:
1. Does it make sense to fold the soundex contrib together with this one?

2. I was debating trying to add multibyte support to levenshtein (it would
make no sense at all for metaphone), but a quick search through the contrib
directory found no hits on the word MULTIBYTE. Should worry about adding
multibyte support to levenshtein()?

Thanks,

Joe


Attachment

pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Current cvs does not compile jdbc1 driver
Next
From: Tom Lane
Date:
Subject: Re: Re: Fuzzy matching