On Wed, Nov 19, 2014 at 5:43 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think we would be well-advised not to start inventing our own
> approximate matching algorithm. Peter's suggestion boils down to a
> guess that the default cost parameters for Levenshtein suck, and your
> suggestion boils down to a guess that we can fix the problems with
> Peter's suggestion by bolting another heuristic on top of it - and
> possibly running Levenshtein twice with different sets of cost
> parameters. Ugh.
I agree.
While I am perfectly comfortable with the fact that we are guessing
here, my guesses are based on what I observed to work well with real
schemas, and simulated errors that I thought were representative of
human error. Obviously it's possible that another scheme will do
better sometimes, including for example a scheme that picks a match
entirely at random. But on average, I think that what I have here will
do better than anything else proposed so far.
--
Peter Geoghegan