BUG #5098: Levenshtein with costs is broken - Mailing list pgsql-bugs

From Matthew Byrne
Subject BUG #5098: Levenshtein with costs is broken
Date
Msg-id 200910070138.n971cL6M001399@wwwmaster.postgresql.org
Whole thread Raw
Responses Re: BUG #5098: Levenshtein with costs is broken
Re: BUG #5098: Levenshtein with costs is broken
List pgsql-bugs
The following bug has been logged online:

Bug reference:      5098
Logged by:          Matthew Byrne
Email address:      matthew@hairybrain.co.uk
PostgreSQL version: 8.4.1
Operating system:   Linux x86_64 (Ubuntu 9.04)
Description:        Levenshtein with costs is broken
Details:

The Levenshtein with costs function in the fuzzystrmatch module is coded
incorrectly.  The initial values in the distance matrix are set up as 0, 1,
2 etc. when they should be 0, deletion_cost, (2 * deletion_cost)... and 0,
insertion_cost, (2 * insertion_cost) etc. respectively.  This causes the
function to return incorrect values, e.g.:

SELECT LEVENSHTEIN('ABC', 'XABC', 100, 100, 100)

returns 1 (the correct answer is 100).

To fix this, make the following changes to fuzzystrmatch.c:

Change line 244 from
prev[i] = i;
to
prev[i] = i * del_c;

and change line 255 from
curr[0] = j;
to
curr[0] = j * ins_c;

pgsql-bugs by date:

Previous
From: Sachin Srivastava
Date:
Subject: Re: BUG #5096: Error installing edb_apachephp.bin
Next
From: "konishi"
Date:
Subject: BUG #5099: When MetaData is acquired, it becomes an SQL error.