> > Tatsuo Ishii <ishii@postgresql.org> writes:
> > >> I'm confused. If this is exactly the same as EUC_JP, why do we need
> > >> any new code at all?
> >
> > > I said *encoding schema" is same, not the contents (character set) is
> > > same. In another word, characters included in EUC_JP are not same as
> > > EUC_JIS_2004.
> >
> > I'm still confused. If the set of characters is different, then surely
> > we need at least a different UTF8<->EUC_JIS_2004 conversion function?
>
> Yes, exactly. I will come up with new conversions later.
I have committed changes to add JIS X 0213 along with conversions.
New encodings:
EUC_JIS_2004: JIS X 0213 encoded in EUC
SHIFT_JIS_2004: JIS X 0213 encoded in Shift JIS (client only encoding)
These encodings support following character sets:
ASCII, JIS X 0201 (single byte "katakana"), JIS X 0213 plane 1, 2
New conversions:
EUC_JIS_2004 --> UTF8: euc_jis_2004_to_utf8
UTF8 --> EUC_JIS_2004: utf8_to_euc_jis_2004
SHIFT_JIS_2004 --> UTF8: shift_jis_2004_to_utf8
UTF8 --> SHIFT_JIS_2004: utf8_to_shift_jis_2004
EUC_JIS_2004 --> SHIFT_JIS_2004: euc_jis_2004_to_shift_jis_2004
SHIFT_JIS_2004 --> EUC_JIS_2004: shift_jis_2004_to_euc_jis_2004
To generate conversion maps, I have created two perl scripts
UCS_to_SHIFT_JIS_2004.pl and UCS_to_EUC_JIS_2004.pl, which use
sjis-0213-2004-std.txt and euc-jis-2004-std.txt as the source of
conversion specification. They are freely obtained from
http://x0213.org.
Conversions to UTF-8 from EUC_JIS_2004 and SHIFT_JIS_2004
require supporting UTF-8 "combined characters" i.e. a logical
character consists of two UTF-8 characters. To implement this, I have
modified LocalToUtf() and UtfToLocal() by adding new parameter:
"combined character map".
docs changes and regression test changes are committed too.
Beware that I have updated catalog versions. Please do initdb.
--
Tatsuo Ishii
SRA OSS, Inc. Japan