Home > mailing lists

Re: Patch for collation using ICU - Mailing list pgsql-hackers

From	Tatsuo Ishii
Subject	Re: Patch for collation using ICU
Date	May 8, 2005 00:09:21
Msg-id	20050508.090845.39153917.t-ishii@sra.co.jp Whole thread Raw
In response to	Re: Patch for collation using ICU ("John Hansen" <john@geeknet.com.au>)
List	pgsql-hackers

Tree view

> Bruce Momjian wrote:
> > 
> > There are two reasons for that optimization --- first, some 
> > locale support is broken and Unicode encoding with a C locale 
> > crashes (not an issue for ICU), and second, it is an 
> > optimization for languages like Japanese that want to use 
> > unicode, but don't need a locale because upper/lower means 
> > nothing in those character sets.
> 
> No, upper/lower means nothing in those languages, so why would you need
> to optimize upper/lower if they're not used??
> And if they are, it's obviously because the text contains characters
> from other languages (probably english) and as such they should behave
> correctly.

Yes, Japanese (and probably Chinese and Korean) languages include
ASCII character. More precisely ASCII is part of Japanese
encodings(LATIN1 is not, however). And we have no problem at all with
glibc/C locale. See below("unitest" is an UNICODE database).

unitest=# create table t1(t text);
CREATE TABLE
unitest=# \encoding EUC_JP
unitest=# insert into t1 values('abcあいう');
INSERT 1842628 1
unitest=# select upper(t) from t1;  upper   
-----------ABCあいう
(1 row)

So Japanese(including ASCII)/UNICODE behavior is perfectly correct at
this moment. So I strongly object removing that optimization.
--
Tatsuo Ishii

pgsql-hackers by date:

From: Tatsuo Ishii
Date: 08 May 2005, 00:09:15
Subject: Re: Patch for collation using ICU

From: Mike Mascari
Date: 08 May 2005, 01:59:57
Subject: Re: pl/pgsql enabled by default

Re: Patch for collation using ICU - Mailing list pgsql-hackers

Previous

Next