Thread: contrib/levenshtein() has a bug?
The levenshtein function from contrib/fuzzystrmatch.sql has a max arg length of 255. OK, that's cool. But check this out: mbrainz_db=> select max(length(name)) from public.track; max ----- 255 (1 row) mbrainz_db=> select levenshtein(name,'foo') from public.track; ERROR: argument exceeds max length: 255 That seems odd. What's odder is: mbrainz_db=> select levenshtein(substring(name for 100),'foo') from public.track; ERROR: argument exceeds max length: 255 Any suggestions? I'm using the Fedora 5 rpms, so it looks like that puts me at 8.1.4.
On Thu, Sep 28, 2006 at 12:02:34PM -0700, Ben wrote: > The levenshtein function from contrib/fuzzystrmatch.sql has a max arg > length of 255. OK, that's cool. But check this out: > <snip> > mbrainz_db=> select levenshtein(name,'foo') from public.track; > ERROR: argument exceeds max length: 255 The message is slightly wrong, the max length is actually one more. You can adjust the maximum length by changing the params in fuzzystrmatch.h and recompiling. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Ben <bench@silentmedia.com> writes: > The levenshtein function from contrib/fuzzystrmatch.sql has a max arg > length of 255. OK, that's cool. But check this out: > mbrainz_db=> select max(length(name)) from public.track; > max > ----- > 255 > (1 row) > mbrainz_db=> select levenshtein(name,'foo') from public.track; > ERROR: argument exceeds max length: 255 > That seems odd. length() measures in characters whereas the limit in question is being enforced in bytes. You got any multibyte characters in there? (It looks to me like levenshtein() is utterly non-multibyte-aware, which is probably a bug in itself.) regards, tom lane
Ah, yes, you are correct. Hm, it's too bad levenshtein() is ascii-only. On Thu, 28 Sep 2006, Tom Lane wrote: > Ben <bench@silentmedia.com> writes: >> The levenshtein function from contrib/fuzzystrmatch.sql has a max arg >> length of 255. OK, that's cool. But check this out: > >> mbrainz_db=> select max(length(name)) from public.track; >> max >> ----- >> 255 >> (1 row) > >> mbrainz_db=> select levenshtein(name,'foo') from public.track; >> ERROR: argument exceeds max length: 255 > >> That seems odd. > > length() measures in characters whereas the limit in question is being > enforced in bytes. You got any multibyte characters in there? > > (It looks to me like levenshtein() is utterly non-multibyte-aware, > which is probably a bug in itself.) > > regards, tom lane >