Thread: How to extend server side encoding GBK
I just noticed that PG not support the following encoding:
/* followings are for client encoding only */
PG_SJIS, /* Shift JIS (Winindows-932) */
PG_BIG5, /* Big5 (Windows-950) */
PG_GBK, /* GBK (Windows-936) */
PG_UHC, /* UHC (Windows-949) */
PG_GB18030, /* GB18030 */
PG_JOHAB, /* EUC for Korean JOHAB */
PG_SHIFT_JIS_2004, /* Shift-JIS-2004 */
_PG_LAST_ENCODING_ /* mark only */
But PG_GBK and PG_GB18030 are very popular in Chinese charset.
Could anybody give some hints about how to extend it in PG source code?
Although UTF-8 and PG_EUC_CN is ready. But I think it's useful to implement the support of PG_GBK.
Any help is appreciated. Thanks in advance.
"Xiong He" <iihero@qq.com> writes: > I just noticed that PG not support the following encoding: > /* followings are for client encoding only */ > PG_SJIS, /* Shift JIS (Winindows-932) */ > PG_BIG5, /* Big5 (Windows-950) */ > PG_GBK, /* GBK (Windows-936) */ > PG_UHC, /* UHC (Windows-949) */ > PG_GB18030, /* GB18030 */ > PG_JOHAB, /* EUC for Korean JOHAB */ > PG_SHIFT_JIS_2004, /* Shift-JIS-2004 */ > _PG_LAST_ENCODING_ /* mark only */ > But PG_GBK and PG_GB18030 are very popular in Chinese charset. > Could anybody give some hints about how to extend it in PG source code? The reason those aren't supported is that they aren't strict ASCII supersets, ie there are multibyte characters in which not all the bytes have the high bit set. This breaks string-processing assumptions all over the place. We are not going to accept any patch that tries to change that, because it would be too complicated, fragile, and slow. Is there a reason why it's not good enough to use these just on the client side, with the server internally using utf8? regards, tom lane
Thanks.
UTF8 is good enough although it needs conversion between client GBK and server side UTF8. I didn't notice that there are high risk to introduce GBK and similar other kind of charsets into server side.
"Xiong He" <iihero@qq.com> writes:
> I just noticed that PG not support the following encoding:
> /* followings are for client encoding only */
> PG_SJIS, /* Shift JIS (Winindows-932) */
> PG_BIG5, /* Big5 (Windows-950) */
> PG_GBK, /* GBK (Windows-936) */
> PG_UHC, /* UHC (Windows-949) */
> PG_GB18030, /* GB18030 */
> PG_JOHAB, /* EUC for Korean JOHAB */
> PG_SHIFT_JIS_2004, /* Shift-JIS-2004 */
> _PG_LAST_ENCODING_ /* mark only */
> But PG_GBK and PG_GB18030 are very popular in Chinese charset.
> Could anybody give some hints about how to extend it in PG source code?
The reason those aren't supported is that they aren't strict ASCII
supersets, ie there are multibyte characters in which not all the bytes
have the high bit set. This breaks string-processing assumptions all
over the place. We are not going to accept any patch that tries to
change that, because it would be too complicated, fragile, and slow.
Is there a reason why it's not good enough to use these just on the
client side, with the server internally using utf8?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
> I just noticed that PG not support the following encoding:
> /* followings are for client encoding only */
> PG_SJIS, /* Shift JIS (Winindows-932) */
> PG_BIG5, /* Big5 (Windows-950) */
> PG_GBK, /* GBK (Windows-936) */
> PG_UHC, /* UHC (Windows-949) */
> PG_GB18030, /* GB18030 */
> PG_JOHAB, /* EUC for Korean JOHAB */
> PG_SHIFT_JIS_2004, /* Shift-JIS-2004 */
> _PG_LAST_ENCODING_ /* mark only */
> But PG_GBK and PG_GB18030 are very popular in Chinese charset.
> Could anybody give some hints about how to extend it in PG source code?
The reason those aren't supported is that they aren't strict ASCII
supersets, ie there are multibyte characters in which not all the bytes
have the high bit set. This breaks string-processing assumptions all
over the place. We are not going to accept any patch that tries to
change that, because it would be too complicated, fragile, and slow.
Is there a reason why it's not good enough to use these just on the
client side, with the server internally using utf8?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers