Thread: BUG #5098: Levenshtein with costs is broken
The following bug has been logged online: Bug reference: 5098 Logged by: Matthew Byrne Email address: matthew@hairybrain.co.uk PostgreSQL version: 8.4.1 Operating system: Linux x86_64 (Ubuntu 9.04) Description: Levenshtein with costs is broken Details: The Levenshtein with costs function in the fuzzystrmatch module is coded incorrectly. The initial values in the distance matrix are set up as 0, 1, 2 etc. when they should be 0, deletion_cost, (2 * deletion_cost)... and 0, insertion_cost, (2 * insertion_cost) etc. respectively. This causes the function to return incorrect values, e.g.: SELECT LEVENSHTEIN('ABC', 'XABC', 100, 100, 100) returns 1 (the correct answer is 100). To fix this, make the following changes to fuzzystrmatch.c: Change line 244 from prev[i] = i; to prev[i] = i * del_c; and change line 255 from curr[0] = j; to curr[0] = j * ins_c;
On Tue, Oct 6, 2009 at 9:38 PM, Matthew Byrne <matthew@hairybrain.co.uk> wr= ote: > > The following bug has been logged online: > > Bug reference: =A0 =A0 =A05098 > Logged by: =A0 =A0 =A0 =A0 =A0Matthew Byrne > Email address: =A0 =A0 =A0matthew@hairybrain.co.uk > PostgreSQL version: 8.4.1 > Operating system: =A0 Linux x86_64 (Ubuntu 9.04) > Description: =A0 =A0 =A0 =A0Levenshtein with costs is broken > Details: > > The Levenshtein with costs function in the fuzzystrmatch module is coded > incorrectly. =A0The initial values in the distance matrix are set up as 0= , 1, > 2 etc. when they should be 0, deletion_cost, (2 * deletion_cost)... and 0, > insertion_cost, (2 * insertion_cost) etc. respectively. =A0This causes the > function to return incorrect values, e.g.: > > SELECT LEVENSHTEIN('ABC', 'XABC', 100, 100, 100) > > returns 1 (the correct answer is 100). > > To fix this, make the following changes to fuzzystrmatch.c: > > Change line 244 from > prev[i] =3D i; > to > prev[i] =3D i * del_c; > > and change line 255 from > curr[0] =3D j; > to > curr[0] =3D j * ins_c; Can you submit this as a patch file? ...Robert
On Tue, Oct 6, 2009 at 8:38 PM, Matthew Byrne <matthew@hairybrain.co.uk> wr= ote: > > The following bug has been logged online: > > Bug reference: =A0 =A0 =A05098 > Logged by: =A0 =A0 =A0 =A0 =A0Matthew Byrne > Email address: =A0 =A0 =A0matthew@hairybrain.co.uk > PostgreSQL version: 8.4.1 > Operating system: =A0 Linux x86_64 (Ubuntu 9.04) > Description: =A0 =A0 =A0 =A0Levenshtein with costs is broken > Details: > > The Levenshtein with costs function in the fuzzystrmatch module is coded > incorrectly. =A0The initial values in the distance matrix are set up as 0= , 1, > 2 etc. when they should be 0, deletion_cost, (2 * deletion_cost)... and 0, > insertion_cost, (2 * insertion_cost) etc. respectively. =A0This causes the > function to return incorrect values, e.g.: > > SELECT LEVENSHTEIN('ABC', 'XABC', 100, 100, 100) > > returns 1 (the correct answer is 100). > > To fix this, make the following changes to fuzzystrmatch.c: > > Change line 244 from > prev[i] =3D i; > to > prev[i] =3D i * del_c; > > and change line 255 from > curr[0] =3D j; > to > curr[0] =3D j * ins_c; FYI, this was fixed in 8.4.2. ...Robert