Thread: bug: fuzzystrmatch levenshtein is wrong

bug: fuzzystrmatch levenshtein is wrong

From
marcin mank
Date:
The current behavior of levenshtein(text,text,int,int,int) is wrong. Consider:

leki_dev=# select levenshtein('','a',2,4,5);
 levenshtein
-------------
           1
(1 row)


leki_dev=# select levenshtein('a','',2,4,5);
 levenshtein
-------------
           1
(1 row)


leki_dev=# select levenshtein('aa','a',2,4,5);
 levenshtein
-------------
           1
(1 row)


leki_dev=# select levenshtein('a','aa',2,4,5);
 levenshtein
-------------
           1
(1 row)

versus (after patch)

postgres=# select levenshtein('','a',2,4,5);
 levenshtein
-------------
           2
(1 row)

postgres=# select levenshtein('a','',2,4,5);
 levenshtein
-------------
           4
(1 row)

postgres=# select levenshtein('aa','a',2,4,5);
 levenshtein
-------------
           4
(1 row)

postgres=# select levenshtein('a','aa',2,4,5);
 levenshtein
-------------
           2
(1 row)

patch attached.

Greetings
Marcin Mańk

Attachment

Re: bug: fuzzystrmatch levenshtein is wrong

From
marcin mank
Date:
also there is integer overflow:
postgres=# select levenshtein('aaaaaaaaaaaaaaaa','',1,1000000000,1);levenshtein
--------------1179869184
(1 row)


should we reject arguments greater than,say, 10000 ?
maximum input length is 255 currently, so the maximum numbers involved
would be about 10000*255*2

This would leave some breathing room if we wanted to increase the
maximum input string length.

Greetings
Marcin


Re: bug: fuzzystrmatch levenshtein is wrong

From
Robert Haas
Date:
On Mon, Dec 7, 2009 at 8:33 AM, marcin mank <marcin.mank@gmail.com> wrote:
> The current behavior of levenshtein(text,text,int,int,int) is wrong. Consider:

Is this the same problem as bug #5098?

...Robert


Re: bug: fuzzystrmatch levenshtein is wrong

From
marcin mank
Date:
On Tue, Dec 8, 2009 at 4:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> The current behavior of levenshtein(text,text,int,int,int) is wrong. Consider:
>
> Is this the same problem as bug #5098?

Yes. Exact same change, plus the shortcut evaluation (for when one of
the inputs is empty) was also wrong. I fixed that too.

Greetings
Maricn Mańk


Re: bug: fuzzystrmatch levenshtein is wrong

From
Robert Haas
Date:
On Mon, Dec 7, 2009 at 8:33 AM, marcin mank <marcin.mank@gmail.com> wrote:
> The current behavior of levenshtein(text,text,int,int,int) is wrong. Consider:
>
> leki_dev=# select levenshtein('','a',2,4,5);
>  levenshtein
> -------------
>           1
> (1 row)
>
>
> leki_dev=# select levenshtein('a','',2,4,5);
>  levenshtein
> -------------
>           1
> (1 row)
>
>
> leki_dev=# select levenshtein('aa','a',2,4,5);
>  levenshtein
> -------------
>           1
> (1 row)
>
>
> leki_dev=# select levenshtein('a','aa',2,4,5);
>  levenshtein
> -------------
>           1
> (1 row)
>
> versus (after patch)
>
> postgres=# select levenshtein('','a',2,4,5);
>  levenshtein
> -------------
>           2
> (1 row)
>
> postgres=# select levenshtein('a','',2,4,5);
>  levenshtein
> -------------
>           4
> (1 row)
>
> postgres=# select levenshtein('aa','a',2,4,5);
>  levenshtein
> -------------
>           4
> (1 row)
>
> postgres=# select levenshtein('a','aa',2,4,5);
>  levenshtein
> -------------
>           2
> (1 row)
>
> patch attached.

I cannot get this patch to apply for anything.  All 4 hunks fail, both
on HEAD and on the the pre-8.4-pgindent version of fuzzystrmatch.c.
Either I'm doing something wrong here, or there's something wrong with
this patch file.

...Robert


Re: bug: fuzzystrmatch levenshtein is wrong

From
Bruce Momjian
Date:
Robert Haas wrote:
> > patch attached.
> 
> I cannot get this patch to apply for anything.  All 4 hunks fail, both
> on HEAD and on the the pre-8.4-pgindent version of fuzzystrmatch.c.
> Either I'm doing something wrong here, or there's something wrong with
> this patch file.

The author converted tabs to spaces --- there is not a single tab in the
diff file.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: bug: fuzzystrmatch levenshtein is wrong

From
Robert Haas
Date:
On Tue, Dec 8, 2009 at 9:08 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Robert Haas wrote:
>> > patch attached.
>>
>> I cannot get this patch to apply for anything.  All 4 hunks fail, both
>> on HEAD and on the the pre-8.4-pgindent version of fuzzystrmatch.c.
>> Either I'm doing something wrong here, or there's something wrong with
>> this patch file.
>
> The author converted tabs to spaces --- there is not a single tab in the
> diff file.

Ah.  I knew there had to be a reason.

I'm attaching my version of this patch.  Barring objections, I am
going to apply this to HEAD and backpatch to 8.4, where this feature
(and the associated bug) were introduced.

...Robert

Attachment

Re: bug: fuzzystrmatch levenshtein is wrong

From
Robert Haas
Date:
On Tue, Dec 8, 2009 at 9:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Dec 8, 2009 at 9:08 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> Robert Haas wrote:
>>> > patch attached.
>>>
>>> I cannot get this patch to apply for anything.  All 4 hunks fail, both
>>> on HEAD and on the the pre-8.4-pgindent version of fuzzystrmatch.c.
>>> Either I'm doing something wrong here, or there's something wrong with
>>> this patch file.
>>
>> The author converted tabs to spaces --- there is not a single tab in the
>> diff file.
>
> Ah.  I knew there had to be a reason.
>
> I'm attaching my version of this patch.  Barring objections, I am
> going to apply this to HEAD and backpatch to 8.4, where this feature
> (and the associated bug) were introduced.

Done.  Yeah, my first commit!  However, it appears that someone needs
to tell the pgsql-committers list that rhaas@postgresql.org is allowed
to post there.

...Robert


Re: bug: fuzzystrmatch levenshtein is wrong

From
Robert Haas
Date:
On Wed, Dec 9, 2009 at 9:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Dec 8, 2009 at 9:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Dec 8, 2009 at 9:08 PM, Bruce Momjian <bruce@momjian.us> wrote:
>>> Robert Haas wrote:
>>>> > patch attached.
>>>>
>>>> I cannot get this patch to apply for anything.  All 4 hunks fail, both
>>>> on HEAD and on the the pre-8.4-pgindent version of fuzzystrmatch.c.
>>>> Either I'm doing something wrong here, or there's something wrong with
>>>> this patch file.
>>>
>>> The author converted tabs to spaces --- there is not a single tab in the
>>> diff file.
>>
>> Ah.  I knew there had to be a reason.
>>
>> I'm attaching my version of this patch.  Barring objections, I am
>> going to apply this to HEAD and backpatch to 8.4, where this feature
>> (and the associated bug) were introduced.
>
> Done.  Yeah, my first commit!  However, it appears that someone needs
> to tell the pgsql-committers list that rhaas@postgresql.org is allowed
> to post there.

Uh-oh.  Is it bad that I did this between the time Tom updated the
release notes and the time Marc stamped 8.4.2?  *ducks and runs for
cover*

...Robert


Re: bug: fuzzystrmatch levenshtein is wrong

From
Devrim GÜNDÜZ
Date:
On Wed, 2009-12-09 at 21:00 -0500, Robert Haas wrote:
> Done.  Yeah, my first commit!

:)

I think you forgot to update release notes for 8.4.2 :(

--
Devrim GÜNDÜZ, RHCE
Command Prompt - http://www.CommandPrompt.com
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz

Re: bug: fuzzystrmatch levenshtein is wrong

From
marcin mank
Date:
On Thu, Dec 10, 2009 at 3:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Done.  Yeah, my first commit!

Great! Also, thanks for getting this in 8.4.2. Otherwise I would have
to compile this on Windows myself, which is no fun.

About the tabs vs spaces problem - I`ve decided that copying the patch
from a remote machine is best done by selecting it in the terminal and
pasting into a text file. Don`t do that :)

Greetings
Marcin Mańk