Thread: BUG #18654: From fuzzystrmatch, levenshtein function with costs parameters produce incorrect results
BUG #18654: From fuzzystrmatch, levenshtein function with costs parameters produce incorrect results
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18654 Logged by: bjdev Email address: bjdev.gthb@laposte.net PostgreSQL version: 15.4 Operating system: Ubuntu 22.04.5 LTS Description: Hi, The extension fuzzystrmatch propose an implementation of levenshtein function. There is one version with costs parameters levenshtein(text source, text target, int ins_cost, int del_cost, int sub_cost) returns int But if we use this function with parameters other than 1 (the default) the result is incorrect SELECT levenshtein('horses','shorse',1,1,1) => 2 (correct) SELECT levenshtein('horses','shorse',100,10,1) => 101 (INCORRECT) The correct result is 6 (all the letter have to be substitute and it's not possible to have a lower score with others operations) Here, it's easy to verify manually but you can check that using python implementation from Levenshtein import distance distance("horses","shorse",weights=(100,10,1)) # => 6 SELECT levenshtein('horses','shorse',1,10,100) => 12 (INCORRECT) The correct result is 11 (insert "s" first (+1) and remove last "s"(+10) Here, it's easy to verify manually but you can check that using python implementation from Levenshtein import distance distance("horses","shorse",weights=(1,10,100)) # => 11 SELECT levenshtein('horses','shorse',1,10,1) => 2 (INCORRECT) The correct result is 6 you can check that using python implementation from Levenshtein import distance distance("horses","shorse",weights=(1,10,1)) # => 6 The use of cost parameters of the levenshtein function is therefore not possible, which is a shame. Regards
Re: BUG #18654: From fuzzystrmatch, levenshtein function with costs parameters produce incorrect results
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > But if we use this function with parameters other than 1 (the default) the > result is incorrect > SELECT levenshtein('horses','shorse',1,1,1) => 2 (correct) > SELECT levenshtein('horses','shorse',100,10,1) => 101 (INCORRECT) > The correct result is 6 (all the letter have to be substitute and it's not > possible to have a lower score with others operations) Hmm, works for me: u8=# create extension fuzzystrmatch; CREATE EXTENSION u8=# SELECT levenshtein('horses','shorse',1,1,1); levenshtein ------------- 2 (1 row) u8=# SELECT levenshtein('horses','shorse',100,10,1); levenshtein ------------- 6 (1 row) I confess bafflement about why you're getting wrong answers. You seem to be using a slightly out of date Postgres, but none of this code has changed meaningfully since about 2016. Maybe you hit a compiler bug? Where did you get this copy of Postgres from --- or if you built it yourself, what build options did you use? regards, tom lane
Re: BUG #18654: From fuzzystrmatch, levenshtein function with costs parameters produce incorrect results
From
bjdev.gthb@laposte.net
Date:
Hi Tom,
Thank you for this feedback,
Indeed, it works correctly. The problem comes from the other side of the keyboard...
I went a little fast and did not check the length of my input strings which contained "invisible" characters.
After cleaning up the results obtained are as expected.
Really sorry for this false alarm
Regards
Thank you for this feedback,
Indeed, it works correctly. The problem comes from the other side of the keyboard...
I went a little fast and did not check the length of my input strings which contained "invisible" characters.
After cleaning up the results obtained are as expected.
Really sorry for this false alarm
Regards
Hmm, works for me:
u8=# create extension fuzzystrmatch;
CREATE EXTENSION
u8=# SELECT levenshtein('horses','shorse',1,1,1);
levenshtein
-------------
2
(1 row)
u8=# SELECT levenshtein('horses','shorse',100,10,1);
levenshtein
-------------
6
(1 row)
I confess bafflement about why you're getting wrong answers.
You seem to be using a slightly out of date Postgres, but
none of this code has changed meaningfully since about 2016.
Maybe you hit a compiler bug? Where did you get this copy
of Postgres from --- or if you built it yourself, what build
options did you use?
regards, tom lane
u8=# create extension fuzzystrmatch;
CREATE EXTENSION
u8=# SELECT levenshtein('horses','shorse',1,1,1);
levenshtein
-------------
2
(1 row)
u8=# SELECT levenshtein('horses','shorse',100,10,1);
levenshtein
-------------
6
(1 row)
I confess bafflement about why you're getting wrong answers.
You seem to be using a slightly out of date Postgres, but
none of this code has changed meaningfully since about 2016.
Maybe you hit a compiler bug? Where did you get this copy
of Postgres from --- or if you built it yourself, what build
options did you use?
regards, tom lane