Thread: Move data between two databases SQL-ASCII to UTF8

Move data between two databases SQL-ASCII to UTF8

From
MargaretGillon@chromalloy.com
Date:

I need to convert my database to UTF8. Is there a way to do a SELECT ... INSERT from the old database table to the new one? Would the INSERT correct data errors between the two data types? I only have 10 tables and the biggest has < 8000 rows.  

Running Version 8.1.4 on Redhat 9
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Margaret Gillon, IS Dept., Chromalloy Los Angeles, ext. 297

This e-mail message and any attachment(s) are for the sole use of the intended recipient(s) and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure.  Any unauthorized review, use, disclosure or distribution is prohibited.  If you are not the intended recipient(s), please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachment(s) to the original message.

Re: Move data between two databases SQL-ASCII to UTF8

From
Clodoaldo
Date:
2007/2/8, MargaretGillon@chromalloy.com <MargaretGillon@chromalloy.com>:
>
> I need to convert my database to UTF8. Is there a way to do a SELECT ...
> INSERT from the old database table to the new one? Would the INSERT correct
> data errors between the two data types? I only have 10 tables and the
> biggest has < 8000 rows.

Use pg_dump to dump the db and use iconv on the generated file:

iconv -f ASCII -t UTF-8 mydb.dump -o mydb_utf8.dump

<GUESS>
If the characters are strictly ASCII (<=127) then the conversion will
not be necessary. But if there are characters bigger than 127 then the
conversion will have to be made from iso-8859-1 to utf-8:

iconv -f ISO_8859-1 -t UTF-8 mydb.dump -o mydb_utf8.dump
</GUESS>

Regards,
--
Clodoaldo Pinto Neto

Re: Move data between two databases SQL-ASCII to UTF8

From
"Chad Wagner"
Date:
On 2/8/07, Clodoaldo <clodoaldo.pinto.neto@gmail.com> wrote:
Use pg_dump to dump the db and use iconv on the generated file:

iconv -f ASCII -t UTF-8 mydb.dump -o mydb_utf8.dump

Wouldn't it be adequate to set the client encoding to SQL_ASCII in the dump file (if that was infact the encoding on the original database)?

SET client_encoding TO SQL_ASCII;

And then let the database do the conversion?  I would think since the db is UTF8 and the client is claiming SQL_ASCII then it would convert the data to UTF8.

I have done this in the past with SQL dumps that had characters that UTF8 didn't like, and I just added the "SET client_encoding TO LATIN1;" since I knew the source encoding was LATIN1.


--
Chad
http://www.postgresqlforums.com/

Re: Move data between two databases SQL-ASCII to UTF8

From
Michael Fuhr
Date:
On Thu, Feb 08, 2007 at 08:22:40PM -0500, Chad Wagner wrote:
> On 2/8/07, Clodoaldo <clodoaldo.pinto.neto@gmail.com> wrote:
> >Use pg_dump to dump the db and use iconv on the generated file:
> >
> >iconv -f ASCII -t UTF-8 mydb.dump -o mydb_utf8.dump

Converting the data from ASCII to UTF-8 doesn't make much sense:
if the data is ASCII then it doesn't need conversion; if the data
needs conversion then it isn't ASCII.

> Wouldn't it be adequate to set the client encoding to SQL_ASCII in the dump
> file (if that was infact the encoding on the original database)?

http://www.postgresql.org/docs/8.2/interactive/multibyte.html#AEN24118

"If the client character set is defined as SQL_ASCII, encoding
conversion is disabled, regardless of the server's character set."

As Clodoaldo mentioned, if the data is strictly ASCII then no
conversion is necessary because the UTF-8 representation will be
the same.  If you set client_encoding to SQL_ASCII and the data
contains non-ASCII characters that aren't valid UTF-8 then you'll
get the error 'invalid byte sequence for encoding "UTF8"'.  In that
case set client_encoding to whatever encoding the data is really
in; likely guesses for Western European languages are LATIN1, LATIN9,
or perhaps WIN1252.

--
Michael Fuhr