Re: [GENERAL] Bug in metaphone (contrib/fuzzystrmatch) - Mailing list pgsql-patches
From | Joe Conway |
---|---|
Subject | Re: [GENERAL] Bug in metaphone (contrib/fuzzystrmatch) |
Date | |
Msg-id | 3EE0BFA2.3000208@joeconway.com Whole thread Raw |
List | pgsql-patches |
Jim C. Nasby wrote: > Second argument to metaphone is suposed to set the limit on the > number of characters to return, but it breaks on some phrases: > > usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from > (select 'Hello world'::varchar AS a) a; > HLW | HLWR | HLWRLT > > usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from > (select 'A A COMEAUX MEMORIAL'::varchar AS a) a; > AKM | AKMKS | AKMKSMMRL > > In every case I've found that does this, the 4th and 5th letters are > always 'KS'. Nice catch. There was a bug in the original metaphone algorithm from CPAN. Patch attached (while I was at it I updated my email address, changed the copyright to PGDG, and removed an unnecessary palloc). Here's how it looks now: regression=# select metaphone(a,4) from (select 'A A COMEAUX MEMORIAL'::varchar AS a) a; metaphone ----------- AKMK (1 row) regression=# select metaphone(a,5) from (select 'A A COMEAUX MEMORIAL'::varchar AS a) a; metaphone ----------- AKMKS (1 row) Please apply. Thanks, Joe Index: contrib/fuzzystrmatch/README.fuzzystrmatch =================================================================== RCS file: /opt/src/cvs/pgsql-server/contrib/fuzzystrmatch/README.fuzzystrmatch,v retrieving revision 1.2 diff -c -r1.2 README.fuzzystrmatch *** contrib/fuzzystrmatch/README.fuzzystrmatch 7 Aug 2001 18:16:01 -0000 1.2 --- contrib/fuzzystrmatch/README.fuzzystrmatch 6 Jun 2003 16:37:54 -0000 *************** *** 3,9 **** * * Functions for "fuzzy" comparison of strings * ! * Copyright (c) Joseph Conway <joseph.conway@home.com>, 2001; * * levenshtein() * ------------- --- 3,12 ---- * * Functions for "fuzzy" comparison of strings * ! * Joe Conway <mail@joeconway.com> ! * ! * Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group ! * ALL RIGHTS RESERVED; * * levenshtein() * ------------- Index: contrib/fuzzystrmatch/fuzzystrmatch.c =================================================================== RCS file: /opt/src/cvs/pgsql-server/contrib/fuzzystrmatch/fuzzystrmatch.c,v retrieving revision 1.7 diff -c -r1.7 fuzzystrmatch.c *** contrib/fuzzystrmatch/fuzzystrmatch.c 10 Mar 2003 22:28:17 -0000 1.7 --- contrib/fuzzystrmatch/fuzzystrmatch.c 6 Jun 2003 16:38:06 -0000 *************** *** 3,9 **** * * Functions for "fuzzy" comparison of strings * ! * Copyright (c) Joseph Conway <joseph.conway@home.com>, 2001; * * levenshtein() * ------------- --- 3,12 ---- * * Functions for "fuzzy" comparison of strings * ! * Joe Conway <mail@joeconway.com> ! * ! * Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group ! * ALL RIGHTS RESERVED; * * levenshtein() * ------------- *************** *** 221,229 **** if (!(reqlen > 0)) elog(ERROR, "metaphone: Requested Metaphone output length must be > 0"); - metaph = palloc(reqlen); - memset(metaph, '\0', reqlen); - retval = _metaphone(str_i, reqlen, &metaph); if (retval == META_SUCCESS) { --- 224,229 ---- *************** *** 629,635 **** /* KS */ case 'X': Phonize('K'); ! Phonize('S'); break; /* Y if followed by a vowel */ case 'Y': --- 629,636 ---- /* KS */ case 'X': Phonize('K'); ! if (max_phonemes == 0 || Phone_Len < max_phonemes) ! Phonize('S'); break; /* Y if followed by a vowel */ case 'Y': Index: contrib/fuzzystrmatch/fuzzystrmatch.h =================================================================== RCS file: /opt/src/cvs/pgsql-server/contrib/fuzzystrmatch/fuzzystrmatch.h,v retrieving revision 1.6 diff -c -r1.6 fuzzystrmatch.h *** contrib/fuzzystrmatch/fuzzystrmatch.h 5 Sep 2002 00:43:06 -0000 1.6 --- contrib/fuzzystrmatch/fuzzystrmatch.h 6 Jun 2003 16:38:13 -0000 *************** *** 3,9 **** * * Functions for "fuzzy" comparison of strings * ! * Copyright (c) Joseph Conway <joseph.conway@home.com>, 2001; * * levenshtein() * ------------- --- 3,12 ---- * * Functions for "fuzzy" comparison of strings * ! * Joe Conway <mail@joeconway.com> ! * ! * Copyright (c) 2001, 2002, 2003 by PostgreSQL Global Development Group ! * ALL RIGHTS RESERVED; * * levenshtein() * -------------
pgsql-patches by date: