Thread: unicode error and problem

unicode error and problem

From

Paolo Supino

Date:

24 March 2004, 06:33:42

Hi

  I received a unicode CSV file from someone (the file was created on a
windows system) and I'm trying to import it into postgresql. When it gets to
a line that isn't ascii it prints the following error and aborts: "ERROR:
copy: line 33, Invalid UNICODE character sequence found (0xd956)". When I
created the db cluster with "-E unicode" and initdb was run with "-E
unicode". As I wrote above the file was created on a windows system. I'm
trying to import it to postgresql 7.3.5 on a Solaris 9 system. Postgresql
was compiled by me with the following configure switches:
 ./configure --prefix=/usr/local --sysconfdir=/etc
--sharedstatedir=/usr/local/share --localstatedir=/var --enable-locale
--enable-recode --enable-multibyte --enable-nls
--with-java --with-openssl=/usr/local --with-CXX --enable-syslog
--with-includes=/usr/local/include --with-libraries=/usr/local/lib.

Anyone knows how to solve this problem so that the file will be imported
properlly?

Re: unicode error and problem

From

Tatsuo Ishii

Date:

24 March 2004, 10:15:21

>   I received a unicode CSV file from someone (the file was created on a
> windows system) and I'm trying to import it into postgresql. When it gets to
> a line that isn't ascii it prints the following error and aborts: "ERROR:
> copy: line 33, Invalid UNICODE character sequence found (0xd956)". When I

The error messages all. 0xd956 cannot be proper UNICODE (actually
UTF-8 in case of PostgreSQL) character at all.
--
Tatsuo Ishii

Re: unicode error and problem

From

Richard Huxton

Date:

24 March 2004, 12:36:52

On Wednesday 24 March 2004 14:15, Tatsuo Ishii wrote:
> >   I received a unicode CSV file from someone (the file was created on a
> > windows system) and I'm trying to import it into postgresql. When it gets
> > to a line that isn't ascii it prints the following error and aborts:
> > "ERROR: copy: line 33, Invalid UNICODE character sequence found
> > (0xd956)". When I
>
> The error messages all. 0xd956 cannot be proper UNICODE (actually
> UTF-8 in case of PostgreSQL) character at all.

I _think_ I've seen something very similar though, with one of the WIN9999
charsets. Can't remember for sure, but it's probably worth checking.


--
  Richard Huxton
  Archonet Ltd

Re: unicode error and problem

From

Markus Bertheau

Date:

24 March 2004, 16:49:35

В Срд, 24.03.2004, в 11:33, Paolo Supino пишет:
> Hi
>
>   I received a unicode CSV file from someone (the file was created on a
> windows system) and I'm trying to import it into postgresql. When it gets to
> a line that isn't ascii it prints the following error and aborts: "ERROR:
> copy: line 33, Invalid UNICODE character sequence found (0xd956)".

Try to convert the file from UTF-16 (which might be the encoding of the
file) to UTF-8 with iconv:

iconv --from UTF-16 --to UTF-8 file > file.UTF-8

Maybe the file is not in UTF-16 but in some other encoding - convert
accordingly then.

By the way, Unicode is just a number -> glyph mapping, it doesn't say
anything about the representation of that number in the byte stream.
UTF-8 and UTF-16 are such representation specifications.

The encoding name in PostgreSQL should be changed from UNICODE to UTF-8
because UNICODE really just isn't an encoding.

--
Markus Bertheau <twanger@bluetwanger.de>

Re: [HACKERS] unicode error and problem

From

Tatsuo Ishii

Date:

25 March 2004, 00:17:44

> By the way, Unicode is just a number -> glyph mapping, it doesn't say
> anything about the representation of that number in the byte stream.
> UTF-8 and UTF-16 are such representation specifications.
>
> The encoding name in PostgreSQL should be changed from UNICODE to UTF-8
> because UNICODE really just isn't an encoding.

Actually you can use "UTF-8" instead of "UNICODE" when using
PostgreSQL. However the "primary" name is still UNICODE, and I agree
it's better to change to UTF-8 for the primary name. Maybe for 7.5?
--
Tatsuo Ishii