Home > mailing lists

Re: Doing better at HINTing an appropriate column within errorMissingColumn() - Mailing list pgsql-hackers

From	Ian Barwick
Subject	Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Date	June 17, 2014 02:10:32
Msg-id	539FA371.4070902@2ndquadrant.com Whole thread Raw
In response to	Re: Doing better at HINTing an appropriate column within errorMissingColumn() (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Doing better at HINTing an appropriate column within errorMissingColumn()
List	pgsql-hackers

Tree view

On 14/06/17 9:53, Tom Lane wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Tue, Jun 17, 2014 at 9:30 AM, Ian Barwick <ian@2ndquadrant.com> wrote:
>>> From what I've seen in the wild in Japan, Roman/ASCII characters are
>>> widely used for object/attribute names, as generally it's much less
>>> hassle than switching between input methods, dealing with different
>>> encodings etc. The only place where I've seen Japanese characters widely
>>> used is in tutorials, examples etc. However that's only my personal
>>> observation for one particular non-Roman language.
> 
>> And I agree to this remark, that's a PITA to manage database object
>> names with Japanese characters directly. I have ever seen some
>> applications using such ways to define objects though in the past, not
>> *that* many I concur..
> 
> What exactly is the rationale for thinking that Levenshtein distance is
> useless in non-Roman alphabets?  AFAIK it just counts insertions and
> deletions of characters, which seems like a concept rather independent
> of what those characters are.

With Japanese (which doesn't have an alphabet, but two syllabaries and
a bunch of logographic characters), Levenshtein distance is pretty useless
for examining similarities with words which can be written in either
syllabary (Michael's "ramen" example earlier in the thread); and when
catching "typos" caused by erroneous conversion from phonetic input to
characters - e.g. intending to input "成長" (seichou, growth) but
accidentally selecting "清聴" (seichou, courteous attention).

Howver in this particular use case, as long as it doesn't produce false
positives (I haven't looked at the patch) I don't think it would cause
any problems (of the kind which would require actively excluding certain
languages/character sets), it just wouldn't be quite as useful.

Regards

Ian Barwick

-- Ian Barwick                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Craig Ringer
Date: 17 June 2014, 01:58:17
Subject: Re: How to change the pgsql source code and build it??

From: Noah Misch
Date: 17 June 2014, 02:17:05
Subject: Re: Built-in support for a memory consumption ulimit?

Re: Doing better at HINTing an appropriate column within errorMissingColumn() - Mailing list pgsql-hackers

Previous

Next