Re: MSSQL to PostgreSQL : Encoding problem - Mailing list pgsql-general

From Arnaud Lesauvage
Subject Re: MSSQL to PostgreSQL : Encoding problem
Date
Msg-id 45645FFA.2040006@freesurf.fr
Whole thread Raw
In response to Re: MSSQL to PostgreSQL : Encoding problem  (Alvaro Herrera <alvherre@commandprompt.com>)
Responses Re: MSSQL to PostgreSQL : Encoding problem  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-general
Alvaro Herrera a écrit :
> Arnaud Lesauvage wrote:
>> Alvaro Herrera a écrit :
>> >Arnaud Lesauvage wrote:
>> >>Tomi NA a écrit :
>> >>>>I think I'll go this way... No other choice, actually !
>> >>>>The MSSQL database is in SQL_Latin1_General_CP1_Cl_AS.
>> >>>>I don't really understand what this is. It supports the euro
>> >>>>symbol, so it is probably not pure LATIN1, right ?
>> >>>
>> >>>I suppose you'd have to look at the latin1 codepage character table
>> >>>somewhere...I'm a UTF-8 guy so I'm not well suited to respond to the
>> >>>question. :)
>> >>
>> >>Yep, http://en.wikipedia.org/wiki/Latin-1 tells me that
>> >>LATIN1 is missing the euro sign...
>> >>Grrrrr I hate this !!!
>> >
>> >So use Latin9 ...
>>
>> Of course, but it doesn't work !!!
>> Whatever client encoding I choose in postgresql before
>> COPYing, I get the 'invalid byte sequence error'.
>
> Humm ... how are you choosing the client encoding?  Is it actually
> working?  I don't see how choosing Latin1 or Latin9 and feeding whatever
> byte sequence would give you an "invalid byte sequence".  These charsets
> don't have any way to validate the bytes, as opposed to what UTF-8 can
> do.  So you could end up with invalid bytes if you choose the wrong
> client encoding, but that's a different error.
>

mydb=# SET client_encoding TO LATIN9;
SET
mydb=# COPY statistiques.detailrecherche (log_gid,
champrecherche, valeurrecherche) FROM
'E:\\Production\\Temp\\detailrecherche_ansi.csv' CSV;
ERROR:  invalid byte sequence for encoding "LATIN9": 0x00
HINT:  This error can also happen if the byte sequence does
not match the encoding expected by the server, which is
controlled by "client_encoding".
CONTEXT:  COPY detailrecherche, line 9212
mydb=# SET client_encoding TO WIN1252;
SET
mydb=# COPY statistiques.detailrecherche (log_gid,
champrecherche, valeurrecherche) FROM
'E:\\Production\\Temp\\detailrecherche_ansi.csv' CSV;
ERROR:  invalid byte sequence for encoding "WIN1252": 0x00
HINT:  This error can also happen if the byte sequence does
not match the encoding expected by the server, which is
controlled by "client_encoding".
CONTEXT:  COPY detailrecherche, line 9212


Really, I'd rather have another error, but this is all I can
get.
This is with the "ANSI" export.
With the "UNICODE" export :

mydb=# SET client_encoding TO UTF8;
SET
mydb=# COPY statistiques.detailrecherche (log_gid,
champrecherche, valeurrecherche) FROM
'E:\\Production\\Temp\\detailrecherche_unicode.csv' CSV;
ERROR:  invalid byte sequence for encoding "UTF8": 0xff
HINT:  This error can also happen if the byte sequence does
not match the encoding expected by the server, which is
controlled by "client_encoding".
CONTEXT:  COPY detailrecherche, line 592680

So, line 592680 is *a lot* better, but it is still not good!

--
Arnaud


pgsql-general by date:

Previous
From: Arnaud Lesauvage
Date:
Subject: Re: MSSQL to PostgreSQL : Encoding problem
Next
From: "Thomas H."
Date:
Subject: Re: MSSQL to PostgreSQL : Encoding problem