Hi,
I would like to propose adding new character set "JIS X
0213"(http://en.wikipedia.org/wiki/JIS_X_0213).
JIS X 0213 is a relatively new Japanese goverment standard (defined in
2000, revised in 2004), and becomes important for Japanese
users. Moreover some commercial OSs including Windows VISTA support
JIS X 0213(some open source OSs support too, of course). So I believe
supporting JIS X 0213 in upcoming 8.3 will be usefull for Japanese
users and will help spreading PostgreSQL more.
Since JIS X 0213 is a character set, we need to add encodings
supporting it. Here are lists of additional encodings (specifications
are already published by the goverment).
1) EUC-JIS-2004
prposed encoding name: EUC_JIS_2004
including following character sets:
- ASCII
- JIS X 0213 plane 1
- JIS X 0201 "katakana"
- JIS X 0213 plane 2
Note that since encoding schema of EUC_JIS_2004 is exactly identical
to EUC_JP, we can reuse existing encoding routines defined in
utls/mb/*.c.
2) Shift-JIS-2004
prposed encoding name: SHIFT_JIS_2004
including following character sets(same as EUC-JIS-2004):
- ASCII
- JIS X 0213 plane 1
- JIS X 0201 "katakana"
- JIS X 0213 plane 2
Note that this is client encoding only due to the same reason as SJIS.
Note that encoding schema of SHIFT_JIS_2004 is exactly identical to
SJIS, we can reuse existing encoding routines defined in utils/mb/*.c.
3) UTF-8
Actually already supported by the recent version of PostgreSQL and no
additional work required.
o About encoding conversion
I will add encoding conversios among EUC_JIS_2004, SHIFT_JIS_2004 and
UTF-8.
Including are patches against CVS head which should illustrate what
I'm proposing in detail. If there's no objection, I will commit them
along with documentation changes, regression updates and bump up
catalog version.
After that I will develop conversion part(it will take several days).
comments, suggestions are welcome.
--
Tatsuo Ishii
SRA OSS, Inc. Japan