Re: Patch for collation using ICU - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Patch for collation using ICU
Date
Msg-id 200505071414.j47EEfZ02040@candle.pha.pa.us
Whole thread Raw
In response to Re: Patch for collation using ICU  (Palle Girgensohn <girgen@pingpong.net>)
List pgsql-hackers
Palle Girgensohn wrote:
> >> This is because in the standard postgres implementation, upper/lower is
> >> done one character at the time. A proper upper/lower cannot do it that
> >> way.  Other known example is in Turkish, where an ? (?) should look
> >> different  whether it is an initial letter or not. This fails in
> >> standard postgresql  for all platforms.
> >
> > Uh, where do you see that?  Our code has:
> >
> >         workspace = texttowcs(string);
> >
> >         for (i = 0; workspace[i] != 0; i++)
> >             workspace[i] = towupper(workspace[i]);
> 
> as you see, the loop runs towupper for one character at the time. I cannot 
> consider whether the letter is the initial, as required in Turkish, and it 
> cannot really convert one character into two ('?' -> 'SS')

Oh, OK. I thought texttowcs() would expand the string to allow such
conversions.

> >> > We have depricated UNICODE in 8.1 in favor of UTF8 (no dash).  Does
> >> > that help?
> >>
> >> I'm aware of that. It might help for unicode, but there are a bunch of
> >> other encodings. IANA has decided that utf-8 has *no* aliases, hence
> >> only  utf-8 (with dash, but case insensitve) is accepted. Perhaps ICU is
> >> fogiving, I don't remember/know, but I think we need the mappings,
> >> unfortunately.
> >
> > OK.  I guess I am just confused why the native implementations are OK.
> 
> They're OK since they understand that UNICODE (or UTF8) is really utf-8. 
> Problem is the strings used to describe them are not understood by ICU.
> 
> BTW, the pg_enc2iananame_tbl is only used *from* internal representation 
> *to* IANA, not the other way around. Maybe that fact lowers the rate of 
> confusion? ;-)

OK, got it.  I am still a little confused why every native
implementation understands our existing names but ICU does not.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: "John Hansen"
Date:
Subject: Re: Patch for collation using ICU
Next
From: "John Hansen"
Date:
Subject: Re: Patch for collation using ICU