Thread: BUG #18654: From fuzzystrmatch, levenshtein function with costs parameters produce incorrect results

The following bug has been logged on the website:

Bug reference:      18654
Logged by:          bjdev
Email address:      bjdev.gthb@laposte.net
PostgreSQL version: 15.4
Operating system:   Ubuntu 22.04.5 LTS
Description:

Hi,

The extension fuzzystrmatch propose an implementation of levenshtein
function.
There is one version with costs parameters
levenshtein(text source, text target, int ins_cost, int del_cost, int
sub_cost) returns int

But if we use this function with parameters other than 1 (the default) the
result is incorrect

SELECT levenshtein('horses','shorse',1,1,1) => 2 (correct)

SELECT levenshtein('horses','shorse',100,10,1) => 101 (INCORRECT)
The correct result is 6 (all the letter have to be substitute and it's not
possible to have a lower score with others operations)
Here, it's easy to verify manually but you can check that using python
implementation 
from Levenshtein import distance
distance("horses","shorse",weights=(100,10,1))
# => 6

SELECT levenshtein('horses','shorse',1,10,100) => 12 (INCORRECT)
The correct result is 11 (insert "s" first (+1) and remove last "s"(+10)
Here, it's easy to verify manually but you can check that using python
implementation 
from Levenshtein import distance
distance("horses","shorse",weights=(1,10,100))
# => 11

SELECT levenshtein('horses','shorse',1,10,1) => 2 (INCORRECT)
The correct result is 6
you can check that using python implementation 
from Levenshtein import distance
distance("horses","shorse",weights=(1,10,1))
# => 6

The use of cost parameters of the levenshtein function is therefore not
possible, which is a shame.

Regards


PG Bug reporting form <noreply@postgresql.org> writes:
> But if we use this function with parameters other than 1 (the default) the
> result is incorrect

> SELECT levenshtein('horses','shorse',1,1,1) => 2 (correct)

> SELECT levenshtein('horses','shorse',100,10,1) => 101 (INCORRECT)
> The correct result is 6 (all the letter have to be substitute and it's not
> possible to have a lower score with others operations)

Hmm, works for me:

u8=# create extension fuzzystrmatch;
CREATE EXTENSION
u8=# SELECT levenshtein('horses','shorse',1,1,1);
 levenshtein 
-------------
           2
(1 row)

u8=# SELECT levenshtein('horses','shorse',100,10,1);
 levenshtein 
-------------
           6
(1 row)

I confess bafflement about why you're getting wrong answers.
You seem to be using a slightly out of date Postgres, but
none of this code has changed meaningfully since about 2016.
Maybe you hit a compiler bug?  Where did you get this copy
of Postgres from --- or if you built it yourself, what build
options did you use?

            regards, tom lane



Hi Tom,

Thank you for this feedback,
Indeed, it works correctly. The problem comes from the other side of the keyboard...
I went a little fast and did not check the length of my input strings which contained "invisible" characters.
After cleaning up the results obtained are as expected.
Really sorry for this false alarm

Regards
Hmm, works for me:

u8=# create extension fuzzystrmatch;
CREATE EXTENSION
u8=# SELECT levenshtein('horses','shorse',1,1,1);
levenshtein
-------------
2
(1 row)

u8=# SELECT levenshtein('horses','shorse',100,10,1);
levenshtein
-------------
6
(1 row)

I confess bafflement about why you're getting wrong answers.
You seem to be using a slightly out of date Postgres, but
none of this code has changed meaningfully since about 2016.
Maybe you hit a compiler bug? Where did you get this copy
of Postgres from --- or if you built it yourself, what build
options did you use?

regards, tom lane