Re: Proposal: Adding JIS X 0213 support - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: Proposal: Adding JIS X 0213 support
Date
Msg-id 20070325.213250.46334984.t-ishii@sraoss.co.jp
Whole thread Raw
In response to Re: Proposal: Adding JIS X 0213 support  (Tatsuo Ishii <ishii@postgresql.org>)
List pgsql-hackers
> > Tatsuo Ishii <ishii@postgresql.org> writes:
> > >> I'm confused.  If this is exactly the same as EUC_JP, why do we need
> > >> any new code at all?
> > 
> > > I said *encoding schema" is same, not the contents (character set) is
> > > same. In another word, characters included in EUC_JP are not same as
> > > EUC_JIS_2004.
> > 
> > I'm still confused.  If the set of characters is different, then surely
> > we need at least a different UTF8<->EUC_JIS_2004 conversion function?
> 
> Yes, exactly. I will come up with new conversions later.

I have committed changes to add JIS X 0213 along with conversions.

New encodings:

EUC_JIS_2004:    JIS X 0213 encoded in EUC
SHIFT_JIS_2004:    JIS X 0213 encoded in Shift JIS (client only encoding)

These encodings support following character sets:

ASCII, JIS X 0201 (single byte "katakana"), JIS X 0213 plane 1, 2

New conversions:

EUC_JIS_2004 --> UTF8: euc_jis_2004_to_utf8
UTF8 --> EUC_JIS_2004: utf8_to_euc_jis_2004
SHIFT_JIS_2004 --> UTF8: shift_jis_2004_to_utf8
UTF8 --> SHIFT_JIS_2004: utf8_to_shift_jis_2004
EUC_JIS_2004 --> SHIFT_JIS_2004: euc_jis_2004_to_shift_jis_2004
SHIFT_JIS_2004 --> EUC_JIS_2004: shift_jis_2004_to_euc_jis_2004

To generate conversion maps, I have created two perl scripts
UCS_to_SHIFT_JIS_2004.pl and UCS_to_EUC_JIS_2004.pl, which use
sjis-0213-2004-std.txt and euc-jis-2004-std.txt as the source of
conversion specification. They are freely obtained from
http://x0213.org.

Conversions to UTF-8 from EUC_JIS_2004 and SHIFT_JIS_2004
require supporting UTF-8 "combined characters" i.e. a logical
character consists of two UTF-8 characters. To implement this, I have
modified LocalToUtf() and UtfToLocal() by adding new parameter: 
"combined character map".

docs changes and regression test changes are committed too.

Beware that I have updated catalog versions. Please do initdb.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: Idea for cleaner representation of snapshots
Next
From: mark@mark.mielke.cc
Date:
Subject: Re: Copyrights on files