On Thu, Nov 20, 2014 at 11:03 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> That does seem to give better results, but it still seems awfully
> complicated. If we just used Levenshtein with all-default cost
> factors and a distance cap equal to Max(strlen(what_user_typed),
> strlen(candidate_match), 3), what cases that you think are important
> would be harmed?
Well, just by plugging in default Levenshtein cost factors, I can see
the following regression:
*** /home/pg/postgresql/src/test/regress/expected/join.out 2014-11-20
10:17:55.042291912 -0800
--- /home/pg/postgresql/src/test/regress/results/join.out 2014-11-20
11:42:15.670108745 -0800
***************
*** 3452,3458 **** ERROR: column atts.relid does not exist LINE 1: select atts.relid::regclass, s.* from pg_stats s
join ^
- HINT: Perhaps you meant to reference the column "atts"."indexrelid".
Within the catalogs, the names of attributes are prefixed as a form of
what you might call internal namespacing. For example, pg_index has
attributes that all begin with "ind*". You could easily omit something
like that, while still more or less knowing what you're looking for.
In more concrete terms, this gets no suggestion:
postgres=# select key from pg_index;
ERROR: 42703: column "key" does not exist
LINE 1: select key from pg_index; ^
Only this does:
postgres=# select ikey from pg_index;
ERROR: 42703: column "ikey" does not exist
LINE 1: select ikey from pg_index; ^
HINT: Perhaps you meant to reference the column "pg_index"."indkey".
postgres=#
The git people varied their Levenshtein costings for a reason.
I also think that a one size fits all cap will break things. It will
independently break the example above, as well as the more marginal
"c1"."f2". vs "c1"."f2" case (okay, maybe that case was exactly on the
threshold, but others won't be).
I don't see that different costings actually saves any complexity.
Similarly, the final cap is quite straightforward. Anything with any
real complexity happens before that.
--
Peter Geoghegan