Thread: Converting an ASCII database to an UTF-8 database

Converting an ASCII database to an UTF-8 database

From
kishore.sainath@gmail.com
Date:
Hi All,

I have a database in PostgreSQL which is ASCII.
Due to some internationalization issues, I need to convert the database
to the UTF-8 format.

So my question is:
How do I convert a database in the ASCII format into one of the UTF-8
format?

Thanks in advance
- Kishore


Re: Converting an ASCII database to an UTF-8 database

From
Ragnar
Date:
On fös, 2006-02-17 at 05:21 -0800, kishore.sainath@gmail.com wrote:
> Hi All,
>
> I have a database in PostgreSQL which is ASCII.
> Due to some internationalization issues, I need to convert the database
> to the UTF-8 format.
>
> So my question is:
> How do I convert a database in the ASCII format into one of the UTF-8
> format?

using pg_dump ?

gnari



Re: Converting an ASCII database to an UTF-8 database

From
Peter Eisentraut
Date:
kishore.sainath@gmail.com wrote:
> How do I convert a database in the ASCII format into one of the UTF-8
> format?

ASCII is a subset of UTF-8, so you don't need to do anything.  Just
change the encoding entry in the pg_database table.  Of course, using
pg_dump would be the official way to convert a database between any two
encodings.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Converting an ASCII database to an UTF-8 database

From
Ragnar
Date:
On fös, 2006-02-17 at 22:38 +0100, Peter Eisentraut wrote:
> kishore.sainath@gmail.com wrote:
> > How do I convert a database in the ASCII format into one of the UTF-8
> > format?
>
> ASCII is a subset of UTF-8, so you don't need to do anything.  Just
> change the encoding entry in the pg_database table.  Of course, using
> pg_dump would be the official way to convert a database between any two
> encodings.

This will only work correctly if the database
definitely does not contain non-ASCII characters.

Assuming by ASCII format we mean that the database was
created SQL_ASCII, then it is possible that it contains
invalid UTF-8 characters, as SQL_ASCII is a 8 bit
encoding.

consider:

template1=# create database test with encoding='SQL_ASCII';
CREATE DATABASE
template1=# \connect test
You are now connected to database "test".
test=# create table a (x text);
CREATE TABLE
test=# insert into a values ('á');
INSERT 33304378 1
test=# select * from a;
 x
---
 á
(1 row)

test=# update pg_database set encoding =
pg_catalog.pg_char_to_encoding('UTF8') where datname='test';
UPDATE 1
test=# select * from a;
 x
---
 á
(1 row)

test=# \connect template1
You are now connected to database "template1".
template1=# \connect test
You are now connected to database "test".
test=# select * from a;
 x
---

(1 row)

test=#


gnari



Re: Converting an ASCII database to an UTF-8 database

From
Rick Gigger
Date:
I have this exact problem.  I have dumped and reloaded other
databases and set the client encoding to convert them to UTF-8 but I
have one database with values that still cause it to fail, even if I
specify that the client encoding is SQL_ASCII.  How do I fix that?

On Feb 17, 2006, at 4:08 PM, Ragnar wrote:

> On fös, 2006-02-17 at 22:38 +0100, Peter Eisentraut wrote:
>> kishore.sainath@gmail.com wrote:
>>> How do I convert a database in the ASCII format into one of the
>>> UTF-8
>>> format?
>>
>> ASCII is a subset of UTF-8, so you don't need to do anything.  Just
>> change the encoding entry in the pg_database table.  Of course, using
>> pg_dump would be the official way to convert a database between
>> any two
>> encodings.
>
> This will only work correctly if the database
> definitely does not contain non-ASCII characters.
>
> Assuming by ASCII format we mean that the database was
> created SQL_ASCII, then it is possible that it contains
> invalid UTF-8 characters, as SQL_ASCII is a 8 bit
> encoding.
>
> consider:
>
> template1=# create database test with encoding='SQL_ASCII';
> CREATE DATABASE
> template1=# \connect test
> You are now connected to database "test".
> test=# create table a (x text);
> CREATE TABLE
> test=# insert into a values ('á');
> INSERT 33304378 1
> test=# select * from a;
>  x
> ---
>  á
> (1 row)
>
> test=# update pg_database set encoding =
> pg_catalog.pg_char_to_encoding('UTF8') where datname='test';
> UPDATE 1
> test=# select * from a;
>  x
> ---
>  á
> (1 row)
>
> test=# \connect template1
> You are now connected to database "template1".
> template1=# \connect test
> You are now connected to database "test".
> test=# select * from a;
>  x
> ---
>
> (1 row)
>
> test=#
>
>
> gnari
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


Re: Converting an ASCII database to an UTF-8 database

From
"Jan Cruz"
Date:
I believe PostgreSQL treat UTF-8 and LATIN9 Differently.
When I tried to dump a db to a UTF-8 encoding and restore it
with UTF-8 encoding (also) it encountered problems with fields
that have unicoded values thus it stop from restoring the whole dump.

So I tried using LATIN9 encoding for both dump and restore.

I believe that LATIN9=UTF-8 encoding base on the DOCS.

I wonder why it is like that.

Re: Converting an ASCII database to an UTF-8 database

From
Peter Eisentraut
Date:
Jan Cruz wrote:
> I believe PostgreSQL treat UTF-8 and LATIN9 Differently.

Certainly, considering that they are different encodings.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Converting an ASCII database to an UTF-8 database

From
Peter Eisentraut
Date:
Rick Gigger wrote:
> I have this exact problem.  I have dumped and reloaded other
> databases and set the client encoding to convert them to UTF-8 but I
> have one database with values that still cause it to fail, even if I
> specify that the client encoding is SQL_ASCII.  How do I fix that?

Well then you need to figure out what encoding those non-ASCII
characters actually are and set the client encoding to that.  Trying
LATIN1 might be a good start, but if you users/applications have
inserted random bytes then you will have to sort it out by hand.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/