Thread: BUG #5098: Levenshtein with costs is broken

BUG #5098: Levenshtein with costs is broken

From
"Matthew Byrne"
Date:
The following bug has been logged online:

Bug reference:      5098
Logged by:          Matthew Byrne
Email address:      matthew@hairybrain.co.uk
PostgreSQL version: 8.4.1
Operating system:   Linux x86_64 (Ubuntu 9.04)
Description:        Levenshtein with costs is broken
Details:

The Levenshtein with costs function in the fuzzystrmatch module is coded
incorrectly.  The initial values in the distance matrix are set up as 0, 1,
2 etc. when they should be 0, deletion_cost, (2 * deletion_cost)... and 0,
insertion_cost, (2 * insertion_cost) etc. respectively.  This causes the
function to return incorrect values, e.g.:

SELECT LEVENSHTEIN('ABC', 'XABC', 100, 100, 100)

returns 1 (the correct answer is 100).

To fix this, make the following changes to fuzzystrmatch.c:

Change line 244 from
prev[i] = i;
to
prev[i] = i * del_c;

and change line 255 from
curr[0] = j;
to
curr[0] = j * ins_c;

Re: BUG #5098: Levenshtein with costs is broken

From
Robert Haas
Date:
On Tue, Oct 6, 2009 at 9:38 PM, Matthew Byrne <matthew@hairybrain.co.uk> wr=
ote:
>
> The following bug has been logged online:
>
> Bug reference: =A0 =A0 =A05098
> Logged by: =A0 =A0 =A0 =A0 =A0Matthew Byrne
> Email address: =A0 =A0 =A0matthew@hairybrain.co.uk
> PostgreSQL version: 8.4.1
> Operating system: =A0 Linux x86_64 (Ubuntu 9.04)
> Description: =A0 =A0 =A0 =A0Levenshtein with costs is broken
> Details:
>
> The Levenshtein with costs function in the fuzzystrmatch module is coded
> incorrectly. =A0The initial values in the distance matrix are set up as 0=
, 1,
> 2 etc. when they should be 0, deletion_cost, (2 * deletion_cost)... and 0,
> insertion_cost, (2 * insertion_cost) etc. respectively. =A0This causes the
> function to return incorrect values, e.g.:
>
> SELECT LEVENSHTEIN('ABC', 'XABC', 100, 100, 100)
>
> returns 1 (the correct answer is 100).
>
> To fix this, make the following changes to fuzzystrmatch.c:
>
> Change line 244 from
> prev[i] =3D i;
> to
> prev[i] =3D i * del_c;
>
> and change line 255 from
> curr[0] =3D j;
> to
> curr[0] =3D j * ins_c;

Can you submit this as a patch file?

...Robert

Re: BUG #5098: Levenshtein with costs is broken

From
Robert Haas
Date:
On Tue, Oct 6, 2009 at 8:38 PM, Matthew Byrne <matthew@hairybrain.co.uk> wr=
ote:
>
> The following bug has been logged online:
>
> Bug reference: =A0 =A0 =A05098
> Logged by: =A0 =A0 =A0 =A0 =A0Matthew Byrne
> Email address: =A0 =A0 =A0matthew@hairybrain.co.uk
> PostgreSQL version: 8.4.1
> Operating system: =A0 Linux x86_64 (Ubuntu 9.04)
> Description: =A0 =A0 =A0 =A0Levenshtein with costs is broken
> Details:
>
> The Levenshtein with costs function in the fuzzystrmatch module is coded
> incorrectly. =A0The initial values in the distance matrix are set up as 0=
, 1,
> 2 etc. when they should be 0, deletion_cost, (2 * deletion_cost)... and 0,
> insertion_cost, (2 * insertion_cost) etc. respectively. =A0This causes the
> function to return incorrect values, e.g.:
>
> SELECT LEVENSHTEIN('ABC', 'XABC', 100, 100, 100)
>
> returns 1 (the correct answer is 100).
>
> To fix this, make the following changes to fuzzystrmatch.c:
>
> Change line 244 from
> prev[i] =3D i;
> to
> prev[i] =3D i * del_c;
>
> and change line 255 from
> curr[0] =3D j;
> to
> curr[0] =3D j * ins_c;

FYI, this was fixed in 8.4.2.

...Robert