Thread: Re: Levenshtein Distance with more than 255 characters

Re: Levenshtein Distance with more than 255 characters

From
janek12@web.de
Date:
Where can I change levensthein_max_length?
Janek Sendrowski

Von: "Szymon Guz" <mabewlun@gmail.com>
An: "Tom Lane" <tgl@sss.pgh.pa.us>
Betreff: Re: [GENERAL] Levenshtein Distance with more than 255 characters
On 6 September 2013 08:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Szymon Guz <mabewlun@gmail.com> writes:
> > On 6 September 2013 01:00, Janek Sendrowski <janek12@web.de> wrote:
> >> I'm searching for an optimized Levenshtein Distance like Postgresql's.
> My
> >> problem is that I want to compare strings with a length over 255
> characters.
> >> Does anyone know a solution?
>
> > I'm not sure there is anything different from what you've found in
> > core/contribs. But you can always use pg/plpython or pg/plperl procedure
> > with some external library calculating the distance.
>
> Well, you could just rebuild the fuzzystrmatch module with a different
> value for MAX_LEVENSHTEIN_STRLEN.  The comments in the code note that the
> comparison cost is roughly O(N^2) in the string length, and the reason for
> having a limit at all is to ensure the function runtime doesn't get out of
> hand --- but it seems likely to me that 255 is an unnecessarily
> conservative limit.  If you wanted to do a few tests and report back on
> just how slow it can get, we might be persuaded to raise the stock
> setting.
>
>                         regards, tom lane
>


I've checked that and I think we could raise the limit without any problem

I've set
#define MAX_LEVENSHTEIN_STRLEN 255 * 255

I was using the levenshtein function comparing two strings of the same
length

Strings length: 640
Difference: 531
Time: 2.5ms

Strings length: 1920
Difference: 1258
Time: 20ms


Strings length: 5760
Difference: 1811
Time: 146ms


regards,
Szymon


Re: Levenshtein Distance with more than 255 characters

From
John R Pierce
Date:
On 9/6/2013 2:00 PM, janek12@web.de wrote:
> Where can I change levensthein_max_length?

as the message you quoted said, its


    #define MAX_LEVENSHTEIN_STRLEN


I'd expect this (without bothering to look) to be in a .h file in the
fuzzystrmatch contributed module directory.



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: Levenshtein Distance with more than 255 characters

From
"Janek Sendrowski"
Date:
<div style="font-family: Verdana;font-size: 12.0px;"><div>Do you know the destination. I cant find it.</div></div>

Re: Levenshtein Distance with more than 255 characters

From
Michael Paquier
Date:
On Sat, Sep 7, 2013 at 6:28 AM, Janek Sendrowski <janek12@web.de> wrote:
> Do you know the destination. I cant find it.
Here it is:
$ find . -name "*.[c|h]" | xgrep MAX_LEVENSHTEIN_STRLEN
./contrib/fuzzystrmatch/levenshtein.c:#define MAX_LEVENSHTEIN_STRLEN        255
./contrib/fuzzystrmatch/levenshtein.c:    if (m > MAX_LEVENSHTEIN_STRLEN ||
./contrib/fuzzystrmatch/levenshtein.c:        n > MAX_LEVENSHTEIN_STRLEN)
./contrib/fuzzystrmatch/levenshtein.c:
MAX_LEVENSHTEIN_STRLEN)));
--
Michael