Re: Questionable description about character sets - Mailing list pgsql-hackers
| From | Tatsuo Ishii |
|---|---|
| Subject | Re: Questionable description about character sets |
| Date | |
| Msg-id | 20260214.192033.705419152780150580.ishii@postgresql.org Whole thread Raw |
| In response to | Re: Questionable description about character sets (Andreas Karlsson <andreas@proxel.se>) |
| Responses |
Re: Questionable description about character sets
|
| List | pgsql-hackers |
> Wouldn't that make the table very wide? I don't think it would make the table very wide but a little bit wider. So I think adding the character sets information to "Description" column is better. Some of encodings already have the info. See attached patch. > And for e.g. European > character encodings I am not sure it is that useful since most or > maybe even all of them are subsets of unicode, it mostly gets > interesting for encodings which support characters not in unicode, > right? Choosing UTF8 or not is just one of the use cases. I am thinking about the use case in which user wants to continue to use other encodings (e.g. wants to avoid conversion to UTF8). Example: suppose the user has a legacy system in which EUC_JP is used. The data in the system includes JIS X 0201, JIS X 0208 and JIS X 0212, and he wants to make sure that PostgreSQL supports all those character sets in EUC_JP, because some tools does not support JIS X 0212. Only JIS X 0212 and JIS X 0208 are supported. Currently the info (whether JIS X 0212 is supported or not) does not exist anywhere in our docs. It's only in the source code. I think it's better to have the info in our docs so that user does not need to look into the source code. Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp From 98c97f670ce647003ce467a84f81cec0cb463c18 Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii <ishii@postgresql.org> Date: Sat, 14 Feb 2026 16:26:01 +0900 Subject: [PATCH v1] doc: Enhance "PostgreSQL Character Sets" table. Previously some of encoding lacked description of coded character sets being used in the encoding. For most of European encoding this is obvious because there's only or few character sets for encoding, but it's not true for some Asian encodings. For example, EUC_JP encoding corresponds to multiple character sets: Namely, JIS X 0201, JIS X 0208 and JIS X 0212. This commit adds the information to "Description" column. Discussion: https://postgr.es/m/20260211.185847.1679085676298121526.ishii%40postgresql.org --- doc/src/sgml/charset.sgml | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 3aabc798012..32c6280489b 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -1831,7 +1831,7 @@ ORDER BY c COLLATE ebcdic; </row> <row> <entry><literal>EUC_CN</literal></entry> - <entry>Extended UNIX Code-CN</entry> + <entry>Extended UNIX Code-CN, GB 2312</entry> <entry>Simplified Chinese</entry> <entry>Yes</entry> <entry>Yes</entry> @@ -1840,7 +1840,7 @@ ORDER BY c COLLATE ebcdic; </row> <row> <entry><literal>EUC_JP</literal></entry> - <entry>Extended UNIX Code-JP</entry> + <entry>Extended UNIX Code-JP, JIS X 0201, JIS X 0208, JIS X 0212</entry> <entry>Japanese</entry> <entry>Yes</entry> <entry>Yes</entry> @@ -1849,7 +1849,7 @@ ORDER BY c COLLATE ebcdic; </row> <row> <entry><literal>EUC_JIS_2004</literal></entry> - <entry>Extended UNIX Code-JP, JIS X 0213</entry> + <entry>Extended UNIX Code-JP, JIS X 0201, JIS X 0213</entry> <entry>Japanese</entry> <entry>Yes</entry> <entry>No</entry> @@ -1858,7 +1858,7 @@ ORDER BY c COLLATE ebcdic; </row> <row> <entry><literal>EUC_KR</literal></entry> - <entry>Extended UNIX Code-KR</entry> + <entry>Extended UNIX Code-KR, KS X 1001</entry> <entry>Korean</entry> <entry>Yes</entry> <entry>Yes</entry> @@ -1867,7 +1867,7 @@ ORDER BY c COLLATE ebcdic; </row> <row> <entry><literal>EUC_TW</literal></entry> - <entry>Extended UNIX Code-TW</entry> + <entry>Extended UNIX Code-TW, CNS 11643</entry> <entry>Traditional Chinese, Taiwanese</entry> <entry>Yes</entry> <entry>Yes</entry> @@ -2056,7 +2056,7 @@ ORDER BY c COLLATE ebcdic; </row> <row> <entry><literal>SJIS</literal></entry> - <entry>Shift JIS</entry> + <entry>Shift JIS, JIS X 0201, JIS X 0208</entry> <entry>Japanese</entry> <entry>No</entry> <entry>No</entry> -- 2.43.0
pgsql-hackers by date: