Re: Doing better at HINTing an appropriate column within errorMissingColumn() - Mailing list pgsql-hackers

From Ian Barwick
Subject Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Date
Msg-id 539FA371.4070902@2ndquadrant.com
Whole thread Raw
In response to Re: Doing better at HINTing an appropriate column within errorMissingColumn()  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Doing better at HINTing an appropriate column within errorMissingColumn()  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On 14/06/17 9:53, Tom Lane wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Tue, Jun 17, 2014 at 9:30 AM, Ian Barwick <ian@2ndquadrant.com> wrote:
>>> From what I've seen in the wild in Japan, Roman/ASCII characters are
>>> widely used for object/attribute names, as generally it's much less
>>> hassle than switching between input methods, dealing with different
>>> encodings etc. The only place where I've seen Japanese characters widely
>>> used is in tutorials, examples etc. However that's only my personal
>>> observation for one particular non-Roman language.
> 
>> And I agree to this remark, that's a PITA to manage database object
>> names with Japanese characters directly. I have ever seen some
>> applications using such ways to define objects though in the past, not
>> *that* many I concur..
> 
> What exactly is the rationale for thinking that Levenshtein distance is
> useless in non-Roman alphabets?  AFAIK it just counts insertions and
> deletions of characters, which seems like a concept rather independent
> of what those characters are.

With Japanese (which doesn't have an alphabet, but two syllabaries and
a bunch of logographic characters), Levenshtein distance is pretty useless
for examining similarities with words which can be written in either
syllabary (Michael's "ramen" example earlier in the thread); and when
catching "typos" caused by erroneous conversion from phonetic input to
characters - e.g. intending to input "成長" (seichou, growth) but
accidentally selecting "清聴" (seichou, courteous attention).

Howver in this particular use case, as long as it doesn't produce false
positives (I haven't looked at the patch) I don't think it would cause
any problems (of the kind which would require actively excluding certain
languages/character sets), it just wouldn't be quite as useful.


Regards

Ian Barwick

-- Ian Barwick                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: How to change the pgsql source code and build it??
Next
From: Noah Misch
Date:
Subject: Re: Built-in support for a memory consumption ulimit?