Re: [GENERAL] contrib/levenshtein() has a bug? - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: [GENERAL] contrib/levenshtein() has a bug?
Date
Msg-id 200702131801.l1DI1E127975@momjian.us
Whole thread Raw
List pgsql-patches
Tom Lane wrote:
> Ben <bench@silentmedia.com> writes:
> > The levenshtein function from contrib/fuzzystrmatch.sql has a max arg
> > length of 255. OK, that's cool. But check this out:
>
> > mbrainz_db=> select max(length(name)) from public.track;
> >   max
> > -----
> >   255
> > (1 row)
>
> > mbrainz_db=> select levenshtein(name,'foo') from public.track;
> > ERROR:  argument exceeds max length: 255
>
> > That seems odd.
>
> length() measures in characters whereas the limit in question is being
> enforced in bytes.  You got any multibyte characters in there?

I have updated the error message to mention bytes, attached.

> (It looks to me like levenshtein() is utterly non-multibyte-aware,
> which is probably a bug in itself.)

Is this a TODO?

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: contrib/fuzzystrmatch/fuzzystrmatch.c
===================================================================
RCS file: /cvsroot/pgsql/contrib/fuzzystrmatch/fuzzystrmatch.c,v
retrieving revision 1.23
diff -c -c -r1.23 fuzzystrmatch.c
*** contrib/fuzzystrmatch/fuzzystrmatch.c    5 Jan 2007 22:19:18 -0000    1.23
--- contrib/fuzzystrmatch/fuzzystrmatch.c    13 Feb 2007 17:56:05 -0000
***************
*** 88,94 ****
      if ((cols > MAX_LEVENSHTEIN_STRLEN + 1) || (rows > MAX_LEVENSHTEIN_STRLEN + 1))
          ereport(ERROR,
                  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
!                  errmsg("argument exceeds max length: %d",
                          MAX_LEVENSHTEIN_STRLEN)));

      /*
--- 88,94 ----
      if ((cols > MAX_LEVENSHTEIN_STRLEN + 1) || (rows > MAX_LEVENSHTEIN_STRLEN + 1))
          ereport(ERROR,
                  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
!                  errmsg("argument exceeds the maximum length of %d bytes",
                          MAX_LEVENSHTEIN_STRLEN)));

      /*
***************
*** 224,230 ****
      if (str_i_len > MAX_METAPHONE_STRLEN)
          ereport(ERROR,
                  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
!                  errmsg("argument exceeds max length: %d",
                          MAX_METAPHONE_STRLEN)));

      if (!(str_i_len > 0))
--- 224,230 ----
      if (str_i_len > MAX_METAPHONE_STRLEN)
          ereport(ERROR,
                  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
!                  errmsg("argument exceeds the maximum length of %d bytes",
                          MAX_METAPHONE_STRLEN)));

      if (!(str_i_len > 0))
***************
*** 236,242 ****
      if (reqlen > MAX_METAPHONE_STRLEN)
          ereport(ERROR,
                  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
!                  errmsg("output length exceeds max length: %d",
                          MAX_METAPHONE_STRLEN)));

      if (!(reqlen > 0))
--- 236,242 ----
      if (reqlen > MAX_METAPHONE_STRLEN)
          ereport(ERROR,
                  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
!                  errmsg("output exceeds the maximum length of %d bytes",
                          MAX_METAPHONE_STRLEN)));

      if (!(reqlen > 0))

pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: tiny patch to make vacuumdb -a's database order match pg_dumpall
Next
From: Heikki Linnakangas
Date:
Subject: Forbid finishing a prepared transaction from another database