============================================================================
POSTGRESQL BUG REPORT: MIC to EUC_TW code converting in mb support
============================================================================
System Configuration
---------------------
Architecture (example: Intel Pentium) :x86
Operating System (example: Linux 2.0.26 ELF) :Linux 2.2.x and FreeBSD
3.5R
PostgreSQL version (example: PostgreSQL-7.0) :PostgreSQL-7.0.2
Compiler used (example: gcc 2.8.0) :egcs-2.91.66, gcc 2.7.3
A FULL description of the problem:
------------------------------------------------
In PostgreSQL mb (multi-byte) support, there is a bug in code converting
for MIC to EUC_TW. Original mic2euc_tw() in conv.c converts CNS
11643-1992
Plane 2 into 2 bytes EUC_TW encoding. But characters in CNS 11643-1992
Plane 2
should be converted into 4 bytes EUC_TW encoding instead.
A way to repeat the problem:
----------------------------------------------------------------------
When you initdb with -E EUC_TW and set PGCLIENTENCODING to BIG5,
you will find all the characters in CNS 11643-1992 Plane 2 are
incorrectly stored or output.
This problem might be fixed by the solution in the attachement.
*** conv.c Wed Nov 8 22:44:21 2000
--- conv.c.orig Sat May 20 21:12:26 2000
***************
*** 906,920 ****
{
len -= pg_mic_mblen(mic++);
! if (c1 == LC_CNS11643_1)
{
- *p++ = *mic++;
- *p++ = *mic++;
- }
- else if (c1 == LC_CNS11643_2)
- {
- *p++ = SS2;
- *p++ = 0xa2;
*p++ = *mic++;
*p++ = *mic++;
}
--- 906,913 ----
{
len -= pg_mic_mblen(mic++);
! if (c1 == LC_CNS11643_1 || c1 == LC_CNS11643_2)
{
*p++ = *mic++;
*p++ = *mic++;
}