Re: questionable item in HISTORY - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: questionable item in HISTORY
Date
Msg-id 200509242256.j8OMuQS26605@candle.pha.pa.us
Whole thread Raw
In response to questionable item in HISTORY  (Tatsuo Ishii <ishii@sraoss.co.jp>)
List pgsql-hackers
Tatsuo Ishii wrote:
> Following item in HISTORY:
> 
>      * Add support for 3 and 4-byte UTF8 characters (John Hansen)
>        Previously only one and two-byte UTF8 characters were supported.
>        This is particularly important for support for some Chinese
>        characters.
> 
> is wrong since 3-byte UTF-8 characters are supported since UTF-8
> support has been added to PostgreSQL. Correct description would be:
> 
>      * Add support for 4-byte UTF8 characters (John Hansen)
>        Previously only up to three-byte UTF8 characters were supported.
>        This is particularly important for support for some Chinese
>        characters.

Release notes updated.

> 
> In the mean time I wonder if we need to update UTF-8 <--> locale
> encoding maps. The author of the patches stated that "This is
> particularly important for support for some Chinese characters". I
> have no idea what encoding he is reffering to, but I wonder if the
> latest Chinense encoding standard GB18030 needs 4-byte UTF-8 mappings.
> If yes, we surely need to update utf8_to_gb18030.map.
> 
> Anybody familiar with GB18030/UTF-8?

Good question.  The report we got in the past was that some UTF
characters were being rejected even though they were valid UTF
characters, mostly Chinese.  I have no idea how they map to GB*
character sets.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Start translating
Next
From: "Jim C. Nasby"
Date:
Subject: Discarding relations from FSM