Re: Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1 - Mailing list pgsql-general

From Albe Laurenz
Subject Re: Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1
Date
Msg-id A737B7A37273E048B164557ADEF4A58B17D16B2D@ntex2010i.host.magwien.gv.at
Whole thread Raw
In response to Re: Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1  (sunpeng <bluevaley@gmail.com>)
Responses Re: Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1  (sunpeng <bluevaley@gmail.com>)
List pgsql-general
sunpeng wrote:
>>> load data to postgresql in cmd(encoding is GBK) is WIN8:
>>> 
>>> psql -h localhost  -d test -U postgres <  dbdata.sql
>>>
>>> I got the error:
>>> ERROR:  invalid byte sequence for encoding "UTF8": 0xff

>> If the encoding is GBK then you will get errors (or incorrect
>> characters) if it is read as UTF8.  Try setting the environment
>> variable PGCLIENTENCODING.
>> 
>> http://www.postgresql.org/docs/9.1/static/app-psql.html

> I‘v changed cmd (in win8) to encoding utf8 through chcp 65001, but error still occurs.
> And i use the following cmd to dump mysql data:
> mysql> select Picture from personpicture where id = 'F2931306D1EE44ca82394CD3BC2404D4'  into outfile
> "d:\\1.txt" ;
> I got the ansi file, and use Ultraedit to see first 16 bytes:
> FF D8 FF E0 5C 30 10 4A 46 49 46 5C 30 01 01 5C
> 
> It's different from mysql workbench to see:
> FF D8 FF E0 00 10 4a 46 49 46 00 01 01 00 00 01

Changing the terminal code page won't do anything, it's probably the data
that are in a different encoding.

I don't know enough about MySQL to know which encoding it uses when dumping data,
but the man page of "mysqldump" tells me:

  --set-charset
  Add SET NAMES default_character_set to the output. This option is enabled by default.

So is there a SET NAMES command in the dump? If yes, what is the argument?

You will have to tell PostgreSQL the encoding of the data.
As Kevin pointed out, you can do that by setting the environment variable
PGCLIENT ENCODING to the correct value.  Then PostgreSQL will convert the
data automatically.

Yours,
Laurenz Albe

pgsql-general by date:

Previous
From: Jacob Bunk Nielsen
Date:
Subject: Re: pg_dump slower than pg_restore
Next
From: sunpeng
Date:
Subject: Re: Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1