Re: [ADMIN] what's the efficient/safest way to convert database character set ? - Mailing list pgsql-general

From Huang, Suya
Subject Re: [ADMIN] what's the efficient/safest way to convert database character set ?
Date
Msg-id D83E55F5F4D99B4A9B4C4E259E6227CD9DF35C@AUX1EXC01.apac.experian.local
Whole thread Raw
In response to Re: [ADMIN] what's the efficient/safest way to convert database character set ?  (John R Pierce <pierce@hogranch.com>)
Responses Re: [ADMIN] what's the efficient/safest way to convert database character set ?
List pgsql-general
Yes John, we probably will use a new database server here to accommodate those converted database.

By saying export/import, do you mean by :
1. pg_dump  (//should I specify -E  UTF 8 to dump the data in UTF-8 encoding?)
2. create database xxx -E UTF8
3. pg_restore

I also see someone's doing this by the following way:
1. perform a plain text dump of database.
    pg_dump -f db.sql [dbname]
2. convert the character encodings.
    iconv db.sql -f ISO-8859-1 -t UTF-8 -o db.utf8.sql
3. create the UTF8 database
    createdb  utf8db  (// I'm not sure why he's not specifying DB encoding here, maybe better use -E to specify the
encodingas UTF8) 
4.restore the converted UTF8 database.
    psql -d utf8db -f db.utf8.sql

which method is better? For what I can tell now is the second approach would generate bigger dump file size, so better
topipe it to bzip to have a compressed file. But other than that, any other considerations? 

Thanks,
Suya
-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of John R Pierce
Sent: Friday, October 18, 2013 11:23 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?

On 10/17/2013 3:13 PM, Huang, Suya wrote:
> I've got a question of converting database from ascii to UTF-8, what's
> the best approach to do so if the database size is very large?
> Detailed procedure or experience sharing are much appreciated!


I believe you will need to dump the whole database, and import it into a
new database that uses UTF8 encoding. Ss far as I know, there's no way
to convert encoding in place. As the other gentlemen pointed out, you
also will have to convert/sanitize all text data, as your current
SQL_ASCII fields could easily contain stuff that's not valid UTF8.

for large databases, this is a major undertaking. I find its often
easiest to do a major change like this between the old and a new
database server.

--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


pgsql-general by date:

Previous
From: "Huang, Suya"
Date:
Subject: Re: [ADMIN] what's the efficient/safest way to convert database character set ?
Next
From: John R Pierce
Date:
Subject: Re: [ADMIN] what's the efficient/safest way to convert database character set ?