Re: [BUGS] Bug #943: Server-Encoding from EUC_TW to UTF-8 doesn't - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: [BUGS] Bug #943: Server-Encoding from EUC_TW to UTF-8 doesn't
Date
Msg-id 20030412.105145.74752700.t-ishii@sra.co.jp
Whole thread Raw
List pgsql-hackers
It turned out that it's a bug with encoding conversion engine of
PostgreSQL. It just failed to find proper entry from a encoding
conversion table because of a integer overflow problem. Since only
maps for EUC_TW have such a huge code point values (for example
0x8eaee7aa), I believe the conversion failure merely occurs with the
particular encoding. Included patches should solve the problem (it is
against PostgreSQL 7.3.2).

BTW, I'm surprised to find the bug since it has been there since 7.2
days.

I'm going to commit the fix to both current and 7.3-stable trees.
--
Tatsuo Ishii

> Short Description
> Server-Encoding from EUC_TW to UTF-8 doesn't work
> 
> Long Description
> System: SuSE Linux 8.1, kernel 2.4.19, glibc 2.2.5/glibc-locale 2.2.5
> the same error on RedHat 7.3, kernel 2.4.20, glibc2.2.5
> postgresql version 7.3.2
> description: I loaded Chinese (TW) characters, encoded as UTF-8 into a
> database which has UTF-8 encoding with "copy table from 'original'" with psql. Ok.
> Than I exit from psql, exported PGCLIENTENCODING=EUC_TW
> I started psql, make a "copy table to 'file.EUC_TW'". Ok.
> If I convert this file to UTF-8 with iconv -f EUC-TW -t UTF-8 file.EUC_TW file.UTF-8
> than file.UTF-8 looks ecaxtly the same as the original.
> That means, PostgreSQL converts from UTF-8 to EUC_TW correct.
> Now I load the exported file 'file.EUC_TW' back into DB:
> "copy table2 from 'file.EUC_TW'", still I did not finish psql,
> PGCLIENTENCODING is the same as for "copy to".
> Now I get error telling me: "copy: line 1,  LocalToUtf: could not convert (0xe5b5) EUC_TW to UTF-8" ... and the
charactersare missing in table2
 
> 
> Sample Code
> UTF-8:
> 00000000: e795 b6e6 97a5 0ae5 959f e58b 95e4 b8ad
> 00000010: 2ce4 bd86 e69c 89e9 8caf e8aa a40a
> 
> EUC_TW as exported from PostgreSQL and not imported:
> 00000000: e5b5 c5ca 0ada f6d9 afc4 e32c c8fe c8b4
> 00000010: f2e3 eba8 0a

*** src/backend/utils/mb/conv.c.orig    2003-04-12 10:03:25.000000000 +0900
--- src/backend/utils/mb/conv.c    2003-04-12 10:16:04.000000000 +0900
***************
*** 313,319 ****      v1 = *(unsigned int *) p1;     v2 = ((pg_utf_to_local *) p2)->utf;
!     return (v1 - v2); }  /*
--- 313,319 ----      v1 = *(unsigned int *) p1;     v2 = ((pg_utf_to_local *) p2)->utf;
!     return (v1 > v2)?1:((v1 == v2)?0:-1); }  /*
***************
*** 328,334 ****      v1 = *(unsigned int *) p1;     v2 = ((pg_local_to_utf *) p2)->code;
!     return (v1 - v2); }  /*
--- 328,334 ----      v1 = *(unsigned int *) p1;     v2 = ((pg_local_to_utf *) p2)->code;
!     return (v1 > v2)?1:((v1 == v2)?0:-1); }  /*



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: backend dies when C function calls C++ library that throws an exception
Next
From: Curt Sampson
Date:
Subject: Re: Speed of SSL connections; cost of renegotiation