encoding confusion - Mailing list pgsql-general

From Sim Zacks
Subject encoding confusion
Date
Msg-id g2lug4$oo3$1@news.hub.org
Whole thread Raw
Responses Re: encoding confusion  (Richard Huxton <dev@archonet.com>)
Re: encoding confusion  ("Albe Laurenz" <laurenz.albe@wien.gv.at>)
List pgsql-general
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

{BACKGROUND]
I am testing dbmail for our corporate email solution.

We originally tested it on mysql and now we are migrating it to postgresql.

The messages are stored in a longblob field on mysql and a bytea field
in postgresql.

I set the database up as UTF-8, even though we get emails that are not
UTF encoded, mostly because I didn't know what else to try that would
incorporate all the possible encodings. Examples of 3 encodings we
regularly receive are: UTF-8, Windows-1255, ISO-8859-8-I.

I transferred the entire database except for one table using the
dbi-link contrib module, connecting directly to MySQL and pulling the
data. The table that did not transfer was the Messageblks table which
has a field mentioned above.
It would not transfer through the dbi-link, so I wrote a python script
(see below) to read a row from mysql and write a row to postgresql
(using pygresql and mysqldb).
When I used pygresql's escape_bytea function to copy the data, it went
smoothly, but the data was corrupt.
When I tried the escape_string function it died because the data it was
moving was not UTF-8.

I finally got it to work by defining a database as SQL-ASCII and then
using escape_string worked. After the data was all in place, I pg_dumped
and pg_restored into a UTF-8 database and it surprisingly works now.

[CONFUSION]
What I don't understand, is that if that database can't handle the non
UTF characters, how does it allow them in when I receive an email
(tested, it works) and when I restored the backup.
I also don't understand why the data transfer didn't work to a UTF
database, but it worked to an ASCII database, if the data can go into a
UTF database from an ascii database.

Lastly, I wanted to know if anybody has experience moving data from
mysql to postgresql and if what I did is considered normal, or if there
is a better way of doing this.

Thank you
Sim

[Python script]:
import MySQLdb
from MySQLdb.cursors import *
import pg
import sys

pgdb=pg.connect(host="1.2.3.4",user="username",dbname="dbmail",
passwd="password")
mydb=MySQLdb.connect(host="localhost",user="mysqluser",passwd="mysqlpassword",
db="dbmail", cursorclass=MySQLdb.cursors.SSDictCursor,charset="utf8")
mycrs=mydb.cursor()
mycrs.execute("select * from dbmail_messageblks")
while 1:
        nextrow=mycrs.fetchone()
        nextrow["messageblk"]=pg.escape_string(nextrow["messageblk"])
        pgdb.query("insert into
dbmail_messageblks(messageblk,physmessage_id,is_header,messageblk_idnr,blocksize)
values($field$%(messageblk)s$field$,%(physmessage_id)s,%(is_header)s,%(messageblk_idnr)s,%(blocksize)s)"
% nextrow)


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkhOfUQACgkQjDX6szCBa+oyJwCghkG7XpcrHKY7ybeJgvjPA/XM
qLIAoMS4gOWpP4o2lKrRU2v0IdtaiSQj
=OhYp
-----END PGP SIGNATURE-----

pgsql-general by date:

Previous
From: "Leif B. Kristensen"
Date:
Subject: Re: REGEXP_REPLACE woes
Next
From: Michael Fuhr
Date:
Subject: Re: REGEXP_REPLACE woes