Home > mailing lists

Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function) - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function)
Date	October 13, 2010 12:22:36
Msg-id	AANLkTimPZMQ=yMuZazrH8v3VP5opo4Ywu5sFxFtMFMo2@mail.gmail.com Whole thread Raw
In response to	Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function) (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function)
List	pgsql-hackers

Tree view

On Wed, Oct 13, 2010 at 10:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Excerpts from Tom Lane's message of mié oct 13 10:32:36 -0300 2010:
>>> Robert Haas <robertmhaas@gmail.com> writes:
>>>> I spent some time hacking on this.  It doesn't appear to be too easy
>>>> to get levenshtein_less_equal() working without slowing down plain old
>>>> levenshtein() by about 6%.
>>>
>>> Is that really enough slowdown to be worth contorting the code to avoid?
>>> I've never heard of an application where the speed of this function was
>>> the bottleneck.
>
>> What if it's used on a expression index on a large table?
>
> So?  Expression indexes don't result in multiple evaluations of the
> function.  If anything, that context would probably be even less
> sensitive to the function runtime than non-index use.
>
> But the main point is that 6% performance penalty in a non-core function
> is well below my threshold of pain.  If it were important enough to care
> about that kind of performance difference, it'd be in core.  I'd rather
> see us keeping the code simple, short, and maintainable.

Well, then you have to wonder whether it's worth having the
lesss-than-or-equal-to version around at all.  That's only about 2x
faster on the same test case.   I do think it's likely that people who
call this function will call it many times, however - e.g. trying to
find the closest matches from a dictionary for a given input string,
so the worry about performance doesn't seem totally out of place.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Peter Geoghegan
Date: 13 October 2010, 12:15:40
Subject: Re: ISN patch that applies cleanly with git apply

From: "Kevin Grittner"
Date: 13 October 2010, 12:26:02
Subject: Re: leaky views, yet again

Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function) - Mailing list pgsql-hackers

Previous

Next