Re: [GENERAL] Bug in metaphone (contrib/fuzzystrmatch) - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: [GENERAL] Bug in metaphone (contrib/fuzzystrmatch)
Date
Msg-id 200306230356.h5N3ucV22076@candle.pha.pa.us
Whole thread Raw
In response to Re: [GENERAL] Bug in metaphone (contrib/fuzzystrmatch)  (Joe Conway <mail@joeconway.com>)
List pgsql-patches
Joe Conway wrote:
> (I never saw this make it to the list yesterday, so I'm resending to
> patches)
>
> Jim C. Nasby wrote:
> > Second argument to metaphone is suposed to set the limit on the
> > number of characters to return, but it breaks on some phrases:
> >
> > usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> > (select 'Hello world'::varchar AS a) a;
> > HLW       | HLWR      | HLWRLT
> >
> > usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> > (select 'A A COMEAUX MEMORIAL'::varchar AS a) a;
>   > AKM       | AKMKS     | AKMKSMMRL
> >
> > In every case I've found that does this, the 4th and 5th letters are
> > always 'KS'.
>
> Nice catch.
>
> There was a bug in the original metaphone algorithm from CPAN. Patch
> attached (while I was at it I updated my email address, changed the
> copyright to PGDG, and removed an unnecessary palloc). Here's how it
> looks now:
>
> regression=# select metaphone(a,4) from (select 'A A COMEAUX
> MEMORIAL'::varchar AS a) a;
>    metaphone
> -----------
>    AKMK
> (1 row)
>
> regression=# select metaphone(a,5) from (select 'A A COMEAUX
> MEMORIAL'::varchar AS a) a;
>    metaphone
> -----------
>    AKMKS
> (1 row)
>
> Please apply.
>
> Thanks,
>
> Joe
>

> Index: contrib/fuzzystrmatch/README.fuzzystrmatch
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/fuzzystrmatch/README.fuzzystrmatch,v
> retrieving revision 1.2
> diff -c -r1.2 README.fuzzystrmatch
> *** contrib/fuzzystrmatch/README.fuzzystrmatch    7 Aug 2001 18:16:01 -0000    1.2
> --- contrib/fuzzystrmatch/README.fuzzystrmatch    6 Jun 2003 16:37:54 -0000
> ***************
> *** 3,9 ****
>    *
>    * Functions for "fuzzy" comparison of strings
>    *
> !  * Copyright (c) Joseph Conway <joseph.conway@home.com>, 2001;
>    *
>    * levenshtein()
>    * -------------
> --- 3,12 ----
>    *
>    * Functions for "fuzzy" comparison of strings
>    *
> !  * Joe Conway <mail@joeconway.com>
> !  *
> !  * Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group
> !  * ALL RIGHTS RESERVED;
>    *
>    * levenshtein()
>    * -------------
> Index: contrib/fuzzystrmatch/fuzzystrmatch.c
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/fuzzystrmatch/fuzzystrmatch.c,v
> retrieving revision 1.7
> diff -c -r1.7 fuzzystrmatch.c
> *** contrib/fuzzystrmatch/fuzzystrmatch.c    10 Mar 2003 22:28:17 -0000    1.7
> --- contrib/fuzzystrmatch/fuzzystrmatch.c    6 Jun 2003 16:38:06 -0000
> ***************
> *** 3,9 ****
>    *
>    * Functions for "fuzzy" comparison of strings
>    *
> !  * Copyright (c) Joseph Conway <joseph.conway@home.com>, 2001;
>    *
>    * levenshtein()
>    * -------------
> --- 3,12 ----
>    *
>    * Functions for "fuzzy" comparison of strings
>    *
> !  * Joe Conway <mail@joeconway.com>
> !  *
> !  * Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group
> !  * ALL RIGHTS RESERVED;
>    *
>    * levenshtein()
>    * -------------
> ***************
> *** 221,229 ****
>       if (!(reqlen > 0))
>           elog(ERROR, "metaphone: Requested Metaphone output length must be > 0");
>
> -     metaph = palloc(reqlen);
> -     memset(metaph, '\0', reqlen);
> -
>       retval = _metaphone(str_i, reqlen, &metaph);
>       if (retval == META_SUCCESS)
>       {
> --- 224,229 ----
> ***************
> *** 629,635 ****
>                   /* KS */
>               case 'X':
>                   Phonize('K');
> !                 Phonize('S');
>                   break;
>                   /* Y if followed by a vowel */
>               case 'Y':
> --- 629,636 ----
>                   /* KS */
>               case 'X':
>                   Phonize('K');
> !                 if (max_phonemes == 0 || Phone_Len < max_phonemes)
> !                     Phonize('S');
>                   break;
>                   /* Y if followed by a vowel */
>               case 'Y':
> Index: contrib/fuzzystrmatch/fuzzystrmatch.h
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/fuzzystrmatch/fuzzystrmatch.h,v
> retrieving revision 1.6
> diff -c -r1.6 fuzzystrmatch.h
> *** contrib/fuzzystrmatch/fuzzystrmatch.h    5 Sep 2002 00:43:06 -0000    1.6
> --- contrib/fuzzystrmatch/fuzzystrmatch.h    6 Jun 2003 16:38:13 -0000
> ***************
> *** 3,9 ****
>    *
>    * Functions for "fuzzy" comparison of strings
>    *
> !  * Copyright (c) Joseph Conway <joseph.conway@home.com>, 2001;
>    *
>    * levenshtein()
>    * -------------
> --- 3,12 ----
>    *
>    * Functions for "fuzzy" comparison of strings
>    *
> !  * Joe Conway <mail@joeconway.com>
> !  *
> !  * Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group
> !  * ALL RIGHTS RESERVED;
>    *
>    * levenshtein()
>    * -------------
>

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

I will try to apply it within the next 48 hours.

---------------------------------------------------------------------------



pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: CIDR addresses in pg_hba.conf
Next
From: Bruce Momjian
Date:
Subject: Re: Runtime.SGML diff ... please expedite!