> ============================================================================
>
> POSTGRESQL BUG REPORT: MIC to EUC_TW code converting in mb support
> ============================================================================
>
> System Configuration
> ---------------------
> Architecture (example: Intel Pentium) :x86
> Operating System (example: Linux 2.0.26 ELF) :Linux 2.2.x and FreeBSD
> 3.5R
> PostgreSQL version (example: PostgreSQL-7.0) :PostgreSQL-7.0.2
> Compiler used (example: gcc 2.8.0) :egcs-2.91.66, gcc 2.7.3
>
> A FULL description of the problem:
> ------------------------------------------------
> In PostgreSQL mb (multi-byte) support, there is a bug in code converting
>
> for MIC to EUC_TW. Original mic2euc_tw() in conv.c converts CNS
> 11643-1992
> Plane 2 into 2 bytes EUC_TW encoding. But characters in CNS 11643-1992
> Plane 2
> should be converted into 4 bytes EUC_TW encoding instead.
>
> A way to repeat the problem:
> ----------------------------------------------------------------------
> When you initdb with -E EUC_TW and set PGCLIENTENCODING to BIG5,
> you will find all the characters in CNS 11643-1992 Plane 2 are
> incorrectly stored or output.
>
> This problem might be fixed by the solution in the attachement.
Thanks for pointing it out. Your fix seems correct.
BTW I have found another bug with EUC_TW support. line 917 in conv.c:
*p++ = c1 - LC_CNS11643_3 + 0xa3;
this should be:
*p++ = *mic++ - LC_CNS11643_3 + 0xa3;
Otherwise, CNS 11643-1992 Plane 3 or more won't work. Could you test
it out with CNS 11643-1992 Plane 3 or more?
If they are ok, I will fix the current source and make a patch for
7.0.3 (I guess it's too late to back-patch the 7.0 tree).
--
Tatsuo Ishii