Thread: UNICODE problem on 7.4 with COPY
When I try to import data from a unicode file into PostgreSQL 7.4 under FreeBSD it appears to now understand the Unicodefile format. To demonstrate I export a set of Integers into a Unicode file from MSSQL 2000. I samba the file to a FreeBSD box and tryto import from psql with COPY. It fails. Wordpad and Notepad both read the file ok, even after I bounce the file via theFreeBSD box (to test samba didn't munge it). FreeBSD 5.1-RELEASE #0 PGSql 7.4 (dl'd and compiled fri 28th Nov 2003) Dual 800MHz P3's I create a database with encoding = UNICODE. I create a table CREATE TABLE testunicode ( anum int4 ) WITHOUT OIDS; I then use psql to import the file, which is a single column of integers. copy testunicode from '/home/toby/itxt/anum.txt'; ERROR: invalid input syntax for integer: "ÿþ1" CONTEXT: COPY testunicode, line 1, column anum: "ÿþ1" When viewing the file as hex I see: FF FE 31 00 31 00 32 00 37 00 39 00 30 00 0D 00 0A 00 ÿ þ 1 . 1 . 2 . 7 . 9 . 0 . . . . . According to http://www.crispen.org/src/archive/0013.html FF FE UTF-16/UCS-2, big endian So, what is going wrong? Why can't I import this very simple unicode file? I've searched the archives and google, but to no avail. Btw, the actual stuff I want to import is larger and more complex, this little table is to demonstrate the problem. Help would be muchly appreciated. Toby
Toby Doig wrote: ... >So, what is going wrong? Why can't I import this very simple unicode file? >I've searched the archives and google, but to no avail. > > try converting the file to utf-8. iconv -t utf-8 -f utf-16 < unicode-file.txt > utf-8-file.txt
Same error as before Toby Doig Software Development Manager Vibrant Media toby@vibrantmedia.com 0207 239 0134 -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Gianni Mariani Sent: 01 December 2003 16:37 To: pgsql-general@postgresql.org Subject: Re: [GENERAL] UNICODE problem on 7.4 with COPY Toby Doig wrote: ... >So, what is going wrong? Why can't I import this very simple unicode file? >I've searched the archives and google, but to no avail. > > try converting the file to utf-8. iconv -t utf-8 -f utf-16 < unicode-file.txt > utf-8-file.txt ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Toby Doig schrieb: > When I try to import data from a unicode file into PostgreSQL 7.4 under FreeBSD it appears to now understand the Unicodefile format. > > To demonstrate I export a set of Integers into a Unicode file from MSSQL 2000. I samba the file to a FreeBSD box and tryto import from psql with COPY. It fails. Wordpad and Notepad both read the file ok, even after I bounce the file via theFreeBSD box (to test samba didn't munge it). > > FreeBSD 5.1-RELEASE #0 > PGSql 7.4 (dl'd and compiled fri 28th Nov 2003) > Dual 800MHz P3's > > I create a database with encoding = UNICODE. > I create a table > > CREATE TABLE testunicode > ( > anum int4 > ) WITHOUT OIDS; > > I then use psql to import the file, which is a single column of integers. > > copy testunicode from '/home/toby/itxt/anum.txt'; > ERROR: invalid input syntax for integer: "ÿþ1" > CONTEXT: COPY testunicode, line 1, column anum: "ÿþ1" > > > When viewing the file as hex I see: > FF FE 31 00 31 00 32 00 37 00 39 00 30 00 0D 00 0A 00 > ÿ þ 1 . 1 . 2 . 7 . 9 . 0 . . . . . > > According to http://www.crispen.org/src/archive/0013.html > > FF FE UTF-16/UCS-2, big endian See also http://www.unicode.org/unicode/faq/utf_bom.html#22 > > So, what is going wrong? Why can't I import this very simple unicode file? > I've searched the archives and google, but to no avail. Postgresql only accepts a stream of chars in the given client encoding. This defaults to "utf-8" when you set up your db as "unicode". psql does not read the BOM information in files since it does not operate on files but on streams. The same I fear is true for postgresqls COPY command. I think a patch made by you is appreciated :-) Regards Tino