Re: [BUGS] Bug #943: Server-Encoding from EUC_TW to UTF-8 doesn'twork - Mailing list pgsql-hackers

From Enke, Michael
Subject Re: [BUGS] Bug #943: Server-Encoding from EUC_TW to UTF-8 doesn'twork
Date
Msg-id 3E9AA746.2E07B899@wincor-nixdorf.com
Whole thread Raw
In response to Re: [BUGS] Bug #943: Server-Encoding from EUC_TW to UTF-8 doesn't  (Tatsuo Ishii <t-ishii@sra.co.jp>)
Responses Re: [BUGS] Bug #943: Server-Encoding from EUC_TW to  (Tatsuo Ishii <t-ishii@sra.co.jp>)
List pgsql-hackers
I tried also BIG5 encoded data (Trad. Chinese for Taiwan) and got warnings:
WARNING:  copy: line 4586, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored
...
Is this also solved with this fix?

Michael


Tatsuo Ishii wrote:
> 
> It turned out that it's a bug with encoding conversion engine of
> PostgreSQL. It just failed to find proper entry from a encoding
> conversion table because of a integer overflow problem. Since only
> maps for EUC_TW have such a huge code point values (for example
> 0x8eaee7aa), I believe the conversion failure merely occurs with the
> particular encoding. Included patches should solve the problem (it is
> against PostgreSQL 7.3.2).
> 
> BTW, I'm surprised to find the bug since it has been there since 7.2
> days.
> 
> I'm going to commit the fix to both current and 7.3-stable trees.
> --
> Tatsuo Ishii
> 
> > Short Description
> > Server-Encoding from EUC_TW to UTF-8 doesn't work
> >
> > Long Description
> > System: SuSE Linux 8.1, kernel 2.4.19, glibc 2.2.5/glibc-locale 2.2.5
> > the same error on RedHat 7.3, kernel 2.4.20, glibc2.2.5
> > postgresql version 7.3.2
> > description: I loaded Chinese (TW) characters, encoded as UTF-8 into a
> > database which has UTF-8 encoding with "copy table from 'original'" with psql. Ok.
> > Than I exit from psql, exported PGCLIENTENCODING=EUC_TW
> > I started psql, make a "copy table to 'file.EUC_TW'". Ok.
> > If I convert this file to UTF-8 with iconv -f EUC-TW -t UTF-8 file.EUC_TW file.UTF-8
> > than file.UTF-8 looks ecaxtly the same as the original.
> > That means, PostgreSQL converts from UTF-8 to EUC_TW correct.
> > Now I load the exported file 'file.EUC_TW' back into DB:
> > "copy table2 from 'file.EUC_TW'", still I did not finish psql,
> > PGCLIENTENCODING is the same as for "copy to".
> > Now I get error telling me: "copy: line 1,  LocalToUtf: could not convert (0xe5b5) EUC_TW to UTF-8" ... and the
charactersare missing in table2
 
> >
> > Sample Code
> > UTF-8:
> > 00000000: e795 b6e6 97a5 0ae5 959f e58b 95e4 b8ad
> > 00000010: 2ce4 bd86 e69c 89e9 8caf e8aa a40a
> >
> > EUC_TW as exported from PostgreSQL and not imported:
> > 00000000: e5b5 c5ca 0ada f6d9 afc4 e32c c8fe c8b4
> > 00000010: f2e3 eba8 0a
> 
> *** src/backend/utils/mb/conv.c.orig    2003-04-12 10:03:25.000000000 +0900
> --- src/backend/utils/mb/conv.c 2003-04-12 10:16:04.000000000 +0900
> ***************
> *** 313,319 ****
> 
>         v1 = *(unsigned int *) p1;
>         v2 = ((pg_utf_to_local *) p2)->utf;
> !       return (v1 - v2);
>   }
> 
>   /*
> --- 313,319 ----
> 
>         v1 = *(unsigned int *) p1;
>         v2 = ((pg_utf_to_local *) p2)->utf;
> !       return (v1 > v2)?1:((v1 == v2)?0:-1);
>   }
> 
>   /*
> ***************
> *** 328,334 ****
> 
>         v1 = *(unsigned int *) p1;
>         v2 = ((pg_local_to_utf *) p2)->code;
> !       return (v1 - v2);
>   }
> 
>   /*
> --- 328,334 ----
> 
>         v1 = *(unsigned int *) p1;
>         v2 = ((pg_local_to_utf *) p2)->code;
> !       return (v1 > v2)?1:((v1 == v2)?0:-1);
>   }
> 
>   /*



pgsql-hackers by date:

Previous
From: Bob Kline
Date:
Subject: Re: [GENERAL] Upgrade to Red Hat Linux 9 broke PostgreSQL
Next
From: "Ron Peacetree"
Date:
Subject: Re: No merge sort?