Re: again: Bug #943: Server-Encoding from EUC_TW toUTF-8 - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: again: Bug #943: Server-Encoding from EUC_TW toUTF-8
Date
Msg-id 20030624.164755.115907234.t-ishii@sra.co.jp
Whole thread Raw
In response to again: Bug #943: Server-Encoding from EUC_TW to UTF-8 doesn'twork  ("Enke, Michael" <michael.enke@wincor-nixdorf.com>)
List pgsql-hackers
> > > I reported bug #943 (I found in 7.3.2) and you checked in some change against integer overflow.
> > > Now I upgraded to 7.3.3 and I'm not happy with this.
> > > The exact error as I described is fixed, but I found new errors in conversion UTF-8 <-> EUC_TW and BIG5:
> > >
> > > Copy to table (DB has UTF-8 encoding) from file:
> > > for PGCLIENTENCODING=BIG5:
> > > WARNING:  copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored
> > > WARNING:  copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored
> > > WARNING:  copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored
> > > WARNING:  copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. Ignored
> > 
> > I see no problem here. The only standard conversion map I could found
> > on-line form so far (see below URL) does not include entries 0xf9d6 or
> > above.
> > 
> > http://www.unicode.org/Public/UNIDATA/Unihan.txt
> 
> 
> I found in this file:
> U+F9D7 in line 604519
> U+F9D8 in line 219540
> U+F9D6...U+F9DB in lines 730707...730766.

No. U+F9D6 means *Unicode* code point, not BIG5 code point.

> 
> > > for EUC_TW
> > > WARNING:  copy: line 1, LocalToUtf: could not convert (0x8ea3c3b7) EUC_TW to UTF-8. Ignored
> > > WARNING:  copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored
> > > WARNING:  copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored
> > > WARNING:  copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. Ignored
> > 
> > Hum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL
> > supports only:
> > 
> > CNS 11643-1993, plane 0
> > CNS 11643-1993, plane 1
> > CNS 11643-1993, plane 2
> > CNS 11643-1993, plane 15
> > 
> > Would you like to have support for rest of CNS 11643-1993 planes:
> > 
> > CNS 11643-1993, plane 3
> > CNS 11643-1993, plane 4
> > CNS 11643-1993, plane 5
> > CNS 11643-1993, plane 6
> > CNS 11643-1993, plane 7
> > 
> > support for upcoming 7.4?
> > 
> > > Copy out to file from table (UTF-8 data):
> > > to BIG5
> > > WARNING:  UtfToLocal: could not convert UTF-8 (0xe7a281). Ignored
> > > WARNING:  UtfToLocal: could not convert UTF-8 (0xe98ab9). Ignored
> > > WARNING:  UtfToLocal: could not convert UTF-8 (0xe8a38f). Ignored
> > > WARNING:  UtfToLocal: could not convert UTF-8 (0xe7b2a7). Ignored
> > >
> > > to EUC_TW is ok!
> > 
> > BIG5 and EUC_TW have different code points. So this is not very strange.
> 
> 
> But it is very strange that I can (for EUC_TW) copy to file without error but I can not copy from file without
error.
> 
> Michael
> 


pgsql-hackers by date:

Previous
From: Larry Rosenman
Date:
Subject: Re: interval's and printing...
Next
From: Karel Zak
Date:
Subject: Re: TO_CHAR SO SLOW???