Re: [GENERAL] Bug in metaphone (contrib/fuzzystrmatch) - Mailing list pgsql-patches
From | Bruce Momjian |
---|---|
Subject | Re: [GENERAL] Bug in metaphone (contrib/fuzzystrmatch) |
Date | |
Msg-id | 200306230356.h5N3ucV22076@candle.pha.pa.us Whole thread Raw |
In response to | Re: [GENERAL] Bug in metaphone (contrib/fuzzystrmatch) (Joe Conway <mail@joeconway.com>) |
List | pgsql-patches |
Joe Conway wrote: > (I never saw this make it to the list yesterday, so I'm resending to > patches) > > Jim C. Nasby wrote: > > Second argument to metaphone is suposed to set the limit on the > > number of characters to return, but it breaks on some phrases: > > > > usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from > > (select 'Hello world'::varchar AS a) a; > > HLW | HLWR | HLWRLT > > > > usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from > > (select 'A A COMEAUX MEMORIAL'::varchar AS a) a; > > AKM | AKMKS | AKMKSMMRL > > > > In every case I've found that does this, the 4th and 5th letters are > > always 'KS'. > > Nice catch. > > There was a bug in the original metaphone algorithm from CPAN. Patch > attached (while I was at it I updated my email address, changed the > copyright to PGDG, and removed an unnecessary palloc). Here's how it > looks now: > > regression=# select metaphone(a,4) from (select 'A A COMEAUX > MEMORIAL'::varchar AS a) a; > metaphone > ----------- > AKMK > (1 row) > > regression=# select metaphone(a,5) from (select 'A A COMEAUX > MEMORIAL'::varchar AS a) a; > metaphone > ----------- > AKMKS > (1 row) > > Please apply. > > Thanks, > > Joe > > Index: contrib/fuzzystrmatch/README.fuzzystrmatch > =================================================================== > RCS file: /opt/src/cvs/pgsql-server/contrib/fuzzystrmatch/README.fuzzystrmatch,v > retrieving revision 1.2 > diff -c -r1.2 README.fuzzystrmatch > *** contrib/fuzzystrmatch/README.fuzzystrmatch 7 Aug 2001 18:16:01 -0000 1.2 > --- contrib/fuzzystrmatch/README.fuzzystrmatch 6 Jun 2003 16:37:54 -0000 > *************** > *** 3,9 **** > * > * Functions for "fuzzy" comparison of strings > * > ! * Copyright (c) Joseph Conway <joseph.conway@home.com>, 2001; > * > * levenshtein() > * ------------- > --- 3,12 ---- > * > * Functions for "fuzzy" comparison of strings > * > ! * Joe Conway <mail@joeconway.com> > ! * > ! * Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group > ! * ALL RIGHTS RESERVED; > * > * levenshtein() > * ------------- > Index: contrib/fuzzystrmatch/fuzzystrmatch.c > =================================================================== > RCS file: /opt/src/cvs/pgsql-server/contrib/fuzzystrmatch/fuzzystrmatch.c,v > retrieving revision 1.7 > diff -c -r1.7 fuzzystrmatch.c > *** contrib/fuzzystrmatch/fuzzystrmatch.c 10 Mar 2003 22:28:17 -0000 1.7 > --- contrib/fuzzystrmatch/fuzzystrmatch.c 6 Jun 2003 16:38:06 -0000 > *************** > *** 3,9 **** > * > * Functions for "fuzzy" comparison of strings > * > ! * Copyright (c) Joseph Conway <joseph.conway@home.com>, 2001; > * > * levenshtein() > * ------------- > --- 3,12 ---- > * > * Functions for "fuzzy" comparison of strings > * > ! * Joe Conway <mail@joeconway.com> > ! * > ! * Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group > ! * ALL RIGHTS RESERVED; > * > * levenshtein() > * ------------- > *************** > *** 221,229 **** > if (!(reqlen > 0)) > elog(ERROR, "metaphone: Requested Metaphone output length must be > 0"); > > - metaph = palloc(reqlen); > - memset(metaph, '\0', reqlen); > - > retval = _metaphone(str_i, reqlen, &metaph); > if (retval == META_SUCCESS) > { > --- 224,229 ---- > *************** > *** 629,635 **** > /* KS */ > case 'X': > Phonize('K'); > ! Phonize('S'); > break; > /* Y if followed by a vowel */ > case 'Y': > --- 629,636 ---- > /* KS */ > case 'X': > Phonize('K'); > ! if (max_phonemes == 0 || Phone_Len < max_phonemes) > ! Phonize('S'); > break; > /* Y if followed by a vowel */ > case 'Y': > Index: contrib/fuzzystrmatch/fuzzystrmatch.h > =================================================================== > RCS file: /opt/src/cvs/pgsql-server/contrib/fuzzystrmatch/fuzzystrmatch.h,v > retrieving revision 1.6 > diff -c -r1.6 fuzzystrmatch.h > *** contrib/fuzzystrmatch/fuzzystrmatch.h 5 Sep 2002 00:43:06 -0000 1.6 > --- contrib/fuzzystrmatch/fuzzystrmatch.h 6 Jun 2003 16:38:13 -0000 > *************** > *** 3,9 **** > * > * Functions for "fuzzy" comparison of strings > * > ! * Copyright (c) Joseph Conway <joseph.conway@home.com>, 2001; > * > * levenshtein() > * ------------- > --- 3,12 ---- > * > * Functions for "fuzzy" comparison of strings > * > ! * Joe Conway <mail@joeconway.com> > ! * > ! * Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group > ! * ALL RIGHTS RESERVED; > * > * levenshtein() > * ------------- > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches I will try to apply it within the next 48 hours. ---------------------------------------------------------------------------
pgsql-patches by date: