Home > mailing lists

Re: Patch for collation using ICU - Mailing list pgsql-hackers

From	John Hansen
Subject	Re: Patch for collation using ICU
Date	May 8, 2005 01:07:46
Msg-id	5066E5A966339E42AA04BA10BA706AE50A930B@rodrick.geeknet.com.au Whole thread Raw
In response to	Patch for collation using ICU (Palle Girgensohn <girgen@pingpong.net>)
Responses	Re: Patch for collation using ICU
List	pgsql-hackers

Tree view

Tatsuo Ishii wrote:
> Sent: Sunday, May 08, 2005 10:09 AM
> To: John Hansen
> Cc: pgman@candle.pha.pa.us; girgen@pingpong.net; 
> pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Patch for collation using ICU
> 
> > Bruce Momjian wrote:
> > > 
> > > There are two reasons for that optimization --- first, 
> some locale 
> > > support is broken and Unicode encoding with a C locale 
> crashes (not 
> > > an issue for ICU), and second, it is an optimization for 
> languages 
> > > like Japanese that want to use unicode, but don't need a locale 
> > > because upper/lower means nothing in those character sets.
> > 
> > No, upper/lower means nothing in those languages, so why would you 
> > need to optimize upper/lower if they're not used??
> > And if they are, it's obviously because the text contains 
> characters 
> > from other languages (probably english) and as such they 
> should behave 
> > correctly.
> 
> Yes, Japanese (and probably Chinese and Korean) languages 
> include ASCII character. More precisely ASCII is part of Japanese
> encodings(LATIN1 is not, however). And we have no problem at 
> all with glibc/C locale. See below("unitest" is an UNICODE database).
> 
> unitest=# create table t1(t text);
> CREATE TABLE
> unitest=# \encoding EUC_JP
> unitest=# insert into t1 values('abcあいう');
> INSERT 1842628 1
> unitest=# select upper(t) from t1;
>    upper   
> -----------
>  ABCあいう
> (1 row)
> 
> So Japanese(including ASCII)/UNICODE behavior is perfectly 
> correct at this moment. 

Right, so you _never_ use accented ascii characters in Japanese? 
(like è for example, whose uppercase is È)

> So I strongly object removing that optimization.

I'm guessing this would call for a vote then, since if implementing ICU, then
I'd have to object to leaving it in.

Changing the bahaviour of ICU doesn't seem right. Changing the behaviour of pg, 
so that it works as it should when using unicode, seems the right solution to me.

> --
> Tatsuo Ishii
> 
>

pgsql-hackers by date:

From: "John Hansen"
Date: 08 May 2005, 00:59:18
Subject: Re: Patch for collation using ICU

From: "John Hansen"
Date: 08 May 2005, 01:15:23
Subject: Re: [GENERAL] Invalid unicode in COPY problem

Re: Patch for collation using ICU - Mailing list pgsql-hackers

Previous

Next