Re: MSSQL to PostgreSQL : Encoding problem - Mailing list pgsql-general

From Brandon Aiken
Subject Re: MSSQL to PostgreSQL : Encoding problem
Date
Msg-id F8E84F0F56445B4CB39E019EF67DACBA3C4BBD@exchsrvr.winemantech.com
Whole thread Raw
In response to Re: MSSQL to PostgreSQL : Encoding problem  (Arnaud Lesauvage <thewild@freesurf.fr>)
Responses Re: MSSQL to PostgreSQL : Encoding problem  ("Tomi NA" <hefest@gmail.com>)
Re: MSSQL to PostgreSQL : Encoding problem  (Martijn van Oosterhout <kleptog@svana.org>)
Re: MSSQL to PostgreSQL : Encoding problem  (Arnaud Lesauvage <thewild@freesurf.fr>)
List pgsql-general
It also might be a big/little endian problem, although I always thought that was platform specific, not locale
specific.

Try the UCS-2-INTERNAL and UCS-4-INTERNAL codepages in iconv, which should use the two-byte or four-byte versions of
UCSencoding using the system's default endian setting. 

There's many Unicode codepage formats that iconv supports:
UTF-8
ISO-10646-UCS-2 UCS-2 CSUNICODE
UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11
UCS-2LE UNICODELITTLE
ISO-10646-UCS-4 UCS-4 CSUCS4
UCS-4BE
UCS-4LE
UTF-16
UTF-16BE
UTF-16LE
UTF-32
UTF-32BE
UTF-32LE
UNICODE-1-1-UTF-7 UTF-7 CSUNICODE11UTF7
UCS-2-INTERNAL
UCS-2-SWAPPED
UCS-4-INTERNAL
UCS-4-SWAPPED

Gee, didn't Unicode just so simplify this codepage mess?  Remember when it was just ASCII, EBCDIC, ANSI, and localized
codepages?

--
Brandon Aiken
CS/IT Systems Engineer
-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Arnaud Lesauvage
Sent: Wednesday, November 22, 2006 12:38 PM
To: Arnaud Lesauvage; General
Subject: Re: [GENERAL] MSSQL to PostgreSQL : Encoding problem

Alvaro Herrera a écrit :
> Arnaud Lesauvage wrote:
>> Alvaro Herrera a écrit :
>> >Arnaud Lesauvage wrote:
>> >
>> >>mydb=# SET client_encoding TO LATIN9;
>> >>SET
>> >>mydb=# COPY statistiques.detailrecherche (log_gid,
>> >>champrecherche, valeurrecherche) FROM
>> >>'E:\\Production\\Temp\\detailrecherche_ansi.csv' CSV;
>> >>ERROR:  invalid byte sequence for encoding "LATIN9": 0x00
>> >>HINT:  This error can also happen if the byte sequence does
>> >>not match the encoding expected by the server, which is
>> >>controlled by "client_encoding".
>> >
>> >Huh, why do you have a "0x00" byte in there?  That's certainly not
>> >Latin9 (nor UTF8 as far as I know).
>> >
>> >Is the file actually Latin-something or did you convert it to something
>> >else at some point?
>>
>> This is the file generated by DTS with "ANSI" encoding. It
>> was not altered in any way after that !
>> The doc states that ANSI exports with the local codepage
>> (which is Win1252). That's all I know. :(
>
> I thought Win1252 was supposed to be almost the same as Latin1.  While
> I'd expect certain differences, I wouldn't expect it to use 0x00 as
> data!
>
> Maybe you could have DTS export Unicode, which would presumably be
> UTF-16, then recode that to something else (possibly UTF-8) with GNU
> iconv.

UTF-16 ! That's something I haven't tried !
I'll try an iconv conversion tomorrow from UTF16 to UTF8 !

--
Arnaud

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Buffer overflow in psql
Next
From: Bruno Wolff III
Date:
Subject: Re: BUG #2772: Undefined Subroutine Pg::connectdb (" ");