On 06/17/2014 02:36 PM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> (2) If there are multiple columns with the same levenschtien distance,
>> which one do you suggest? The current code picks a random one, which
>> I'm OK with. The other option would be to list all of the columns.
>
> I objected to that upthread. I don't think that picking a random one is
> sane at all. Listing them all might be OK (I notice that that seems to be
> what both bash and git do).
>
> Another issue is whether to print only those having exactly the minimum
> observed Levenshtein distance, or to print everything less than some
> cutoff. The former approach seems to me to be placing a great deal of
> faith in something that's only a heuristic.
Well, that depends on what the cutoff is. If it's high, like 0.5, that
could be a LOT of columns. Like, I plan to test this feature with a
3-table join that has a combined 300 columns. I can completely imagine
coming up with a string which is within 0.5 or even 0.3 of 40 columns names.
So if we want to list everything below a cutoff, we'd need to make that
cutoff fairly narrow, like 0.2. But that means we'd miss a lot of
potential matches on short column names.
I really think we're overthinking this: it is just a HINT, and we can
improve it in future PostgreSQL versions, and most of our users will
ignore it anyway because they'll be using a client which doesn't display
HINTs.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com