Thread: unicode error and problem
Hi I received a unicode CSV file from someone (the file was created on a windows system) and I'm trying to import it into postgresql. When it gets to a line that isn't ascii it prints the following error and aborts: "ERROR: copy: line 33, Invalid UNICODE character sequence found (0xd956)". When I created the db cluster with "-E unicode" and initdb was run with "-E unicode". As I wrote above the file was created on a windows system. I'm trying to import it to postgresql 7.3.5 on a Solaris 9 system. Postgresql was compiled by me with the following configure switches: ./configure --prefix=/usr/local --sysconfdir=/etc --sharedstatedir=/usr/local/share --localstatedir=/var --enable-locale --enable-recode --enable-multibyte --enable-nls --with-java --with-openssl=/usr/local --with-CXX --enable-syslog --with-includes=/usr/local/include --with-libraries=/usr/local/lib. Anyone knows how to solve this problem so that the file will be imported properlly?
> I received a unicode CSV file from someone (the file was created on a > windows system) and I'm trying to import it into postgresql. When it gets to > a line that isn't ascii it prints the following error and aborts: "ERROR: > copy: line 33, Invalid UNICODE character sequence found (0xd956)". When I The error messages all. 0xd956 cannot be proper UNICODE (actually UTF-8 in case of PostgreSQL) character at all. -- Tatsuo Ishii
On Wednesday 24 March 2004 14:15, Tatsuo Ishii wrote: > > I received a unicode CSV file from someone (the file was created on a > > windows system) and I'm trying to import it into postgresql. When it gets > > to a line that isn't ascii it prints the following error and aborts: > > "ERROR: copy: line 33, Invalid UNICODE character sequence found > > (0xd956)". When I > > The error messages all. 0xd956 cannot be proper UNICODE (actually > UTF-8 in case of PostgreSQL) character at all. I _think_ I've seen something very similar though, with one of the WIN9999 charsets. Can't remember for sure, but it's probably worth checking. -- Richard Huxton Archonet Ltd
В Срд, 24.03.2004, в 11:33, Paolo Supino пишет: > Hi > > I received a unicode CSV file from someone (the file was created on a > windows system) and I'm trying to import it into postgresql. When it gets to > a line that isn't ascii it prints the following error and aborts: "ERROR: > copy: line 33, Invalid UNICODE character sequence found (0xd956)". Try to convert the file from UTF-16 (which might be the encoding of the file) to UTF-8 with iconv: iconv --from UTF-16 --to UTF-8 file > file.UTF-8 Maybe the file is not in UTF-16 but in some other encoding - convert accordingly then. By the way, Unicode is just a number -> glyph mapping, it doesn't say anything about the representation of that number in the byte stream. UTF-8 and UTF-16 are such representation specifications. The encoding name in PostgreSQL should be changed from UNICODE to UTF-8 because UNICODE really just isn't an encoding. -- Markus Bertheau <twanger@bluetwanger.de>
> By the way, Unicode is just a number -> glyph mapping, it doesn't say > anything about the representation of that number in the byte stream. > UTF-8 and UTF-16 are such representation specifications. > > The encoding name in PostgreSQL should be changed from UNICODE to UTF-8 > because UNICODE really just isn't an encoding. Actually you can use "UTF-8" instead of "UNICODE" when using PostgreSQL. However the "primary" name is still UNICODE, and I agree it's better to change to UTF-8 for the primary name. Maybe for 7.5? -- Tatsuo Ishii