Re: encoding confusion - Mailing list pgsql-general

From Albe Laurenz
Subject Re: encoding confusion
Date
Msg-id D960CB61B694CF459DCFB4B0128514C20230A1CE@exadv11.host.magwien.gv.at
Whole thread Raw
In response to encoding confusion  (Sim Zacks <sim@compulab.co.il>)
List pgsql-general
Sim Zacks wrote:
> We originally tested it on mysql and now we are migrating it
> to postgresql.
>
> The messages are stored in a longblob field on mysql and a bytea field
> in postgresql.
>
> I set the database up as UTF-8, even though we get emails that are not
> UTF encoded, mostly because I didn't know what else to try that would
> incorporate all the possible encodings. Examples of 3 encodings we
> regularly receive are: UTF-8, Windows-1255, ISO-8859-8-I.

[...]

> It would not transfer through the dbi-link, so I wrote a python script
> (see below) to read a row from mysql and write a row to postgresql
> (using pygresql and mysqldb).
> When I used pygresql's escape_bytea function to copy the data, it went
> smoothly, but the data was corrupt.
> When I tried the escape_string function it died because the data it was
> moving was not UTF-8.
>
> I finally got it to work by defining a database as SQL-ASCII and then
> using escape_string worked. After the data was all in place, I pg_dumped
> and pg_restored into a UTF-8 database and it surprisingly works now.

It's very dificult to know what exactly happened unless you have some
examples of a byte sequence that illustrates what you describe:
How it looked in MySQL, how it looked in your Python script, what you
fed to escape_bytea.

What client encoding did you use in your Python script?

Yours,
Laurenz Albe

pgsql-general by date:

Previous
From: "Richard Broersma"
Date:
Subject: what gives: SELECT INVALID SELECT STATEMENT TO FORCE ODBC DRIVER TO UNPREPARED STATE
Next
From: Sim Zacks
Date:
Subject: Re: encoding confusion