Thread: contrib/levenshtein() has a bug?

contrib/levenshtein() has a bug?

From
Ben
Date:
The levenshtein function from contrib/fuzzystrmatch.sql has a max arg
length of 255. OK, that's cool. But check this out:

mbrainz_db=> select max(length(name)) from public.track;
  max
-----
  255
(1 row)

mbrainz_db=> select levenshtein(name,'foo') from public.track;
ERROR:  argument exceeds max length: 255


That seems odd. What's odder is:

mbrainz_db=> select levenshtein(substring(name for 100),'foo') from public.track;
ERROR:  argument exceeds max length: 255



Any suggestions? I'm using the Fedora 5 rpms, so it looks like that puts
me at 8.1.4.

Re: contrib/levenshtein() has a bug?

From
Martijn van Oosterhout
Date:
On Thu, Sep 28, 2006 at 12:02:34PM -0700, Ben wrote:
> The levenshtein function from contrib/fuzzystrmatch.sql has a max arg
> length of 255. OK, that's cool. But check this out:
>
<snip>
> mbrainz_db=> select levenshtein(name,'foo') from public.track;
> ERROR:  argument exceeds max length: 255

The message is slightly wrong, the max length is actually one more. You
can adjust the maximum length by changing the params in
fuzzystrmatch.h and recompiling.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: contrib/levenshtein() has a bug?

From
Tom Lane
Date:
Ben <bench@silentmedia.com> writes:
> The levenshtein function from contrib/fuzzystrmatch.sql has a max arg
> length of 255. OK, that's cool. But check this out:

> mbrainz_db=> select max(length(name)) from public.track;
>   max
> -----
>   255
> (1 row)

> mbrainz_db=> select levenshtein(name,'foo') from public.track;
> ERROR:  argument exceeds max length: 255

> That seems odd.

length() measures in characters whereas the limit in question is being
enforced in bytes.  You got any multibyte characters in there?

(It looks to me like levenshtein() is utterly non-multibyte-aware,
which is probably a bug in itself.)

            regards, tom lane

Re: contrib/levenshtein() has a bug?

From
Ben
Date:
Ah, yes, you are correct.

Hm, it's too bad levenshtein() is ascii-only.

On Thu, 28 Sep 2006, Tom Lane wrote:

> Ben <bench@silentmedia.com> writes:
>> The levenshtein function from contrib/fuzzystrmatch.sql has a max arg
>> length of 255. OK, that's cool. But check this out:
>
>> mbrainz_db=> select max(length(name)) from public.track;
>>   max
>> -----
>>   255
>> (1 row)
>
>> mbrainz_db=> select levenshtein(name,'foo') from public.track;
>> ERROR:  argument exceeds max length: 255
>
>> That seems odd.
>
> length() measures in characters whereas the limit in question is being
> enforced in bytes.  You got any multibyte characters in there?
>
> (It looks to me like levenshtein() is utterly non-multibyte-aware,
> which is probably a bug in itself.)
>
>             regards, tom lane
>