Re: multibyte charater set in levenshtein function - Mailing list pgsql-hackers

From Robert Haas
Subject Re: multibyte charater set in levenshtein function
Date
Msg-id AANLkTikvz=wp7r72Jd8iWrdAegawijzZ6ejm08WXEzpW@mail.gmail.com
Whole thread Raw
In response to Re: multibyte charater set in levenshtein function  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: multibyte charater set in levenshtein function
Re: multibyte charater set in levenshtein function
List pgsql-hackers
On Wed, Jul 21, 2010 at 2:47 PM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:
> On Wed, Jul 21, 2010 at 10:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> *scratches head*  Aren't you just moving the same call to a different
>> place?
>
> So, where you can find this different place? :) In this patch
> null-terminated strings are not used at all.

I can't.  You win.  :-)

Actually, I wonder if there's enough performance improvement there
that we might think about extracting that part of the patch and apply
it separately.  Then we could continue trying to figure out what to do
with the rest.  Sometimes it's simpler to deal with one change at a
time.

> I tested it with american-english dictionary with 98569 words.
>
> test=# select sum(levenshtein(word, 'qwerqwerqwer')) from words;
>    sum
> ---------
>  1074376
> (1 row)
>
> Time: 131,435 ms
> test=# select sum(levenshtein_less_equal(word, 'qwerqwerqwer',100)) from
> words;
>    sum
> ---------
>  1074376
> (1 row)
>
> Time: 221,078 ms
> test=# select sum(levenshtein_less_equal(word, 'qwerqwerqwer',-1)) from
> words;
>    sum
> ---------
>  1074376
> (1 row)
>
> Time: 254,819 ms
>
> The function with negative value of max_d didn't become faster than with
> just big value of max_d.

Ah, I see.  That's pretty compelling, I guess.  Although it still
seems like a lot of code...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


pgsql-hackers by date:

Previous
From: Andreas Joseph Krogh
Date:
Subject: accentuated letters in text-search
Next
From: Robert Haas
Date:
Subject: Re: dynamically allocating chunks from shared memory