Thread: Re: [GENERAL] unicode error and problem

Re: [GENERAL] unicode error and problem

From

Markus Bertheau

Date:

24 March 2004, 16:49:35

В Срд, 24.03.2004, в 11:33, Paolo Supino пишет:
> Hi
>
>   I received a unicode CSV file from someone (the file was created on a
> windows system) and I'm trying to import it into postgresql. When it gets to
> a line that isn't ascii it prints the following error and aborts: "ERROR:
> copy: line 33, Invalid UNICODE character sequence found (0xd956)".

Try to convert the file from UTF-16 (which might be the encoding of the
file) to UTF-8 with iconv:

iconv --from UTF-16 --to UTF-8 file > file.UTF-8

Maybe the file is not in UTF-16 but in some other encoding - convert
accordingly then.

By the way, Unicode is just a number -> glyph mapping, it doesn't say
anything about the representation of that number in the byte stream.
UTF-8 and UTF-16 are such representation specifications.

The encoding name in PostgreSQL should be changed from UNICODE to UTF-8
because UNICODE really just isn't an encoding.

--
Markus Bertheau <twanger@bluetwanger.de>

Re: [GENERAL] unicode error and problem

From

Tatsuo Ishii

Date:

25 March 2004, 00:17:42

> By the way, Unicode is just a number -> glyph mapping, it doesn't say
> anything about the representation of that number in the byte stream.
> UTF-8 and UTF-16 are such representation specifications.
>
> The encoding name in PostgreSQL should be changed from UNICODE to UTF-8
> because UNICODE really just isn't an encoding.

Actually you can use "UTF-8" instead of "UNICODE" when using
PostgreSQL. However the "primary" name is still UNICODE, and I agree
it's better to change to UTF-8 for the primary name. Maybe for 7.5?
--
Tatsuo Ishii