Thread: Converting an ASCII database to an UTF-8 database
Hi All, I have a database in PostgreSQL which is ASCII. Due to some internationalization issues, I need to convert the database to the UTF-8 format. So my question is: How do I convert a database in the ASCII format into one of the UTF-8 format? Thanks in advance - Kishore
On fös, 2006-02-17 at 05:21 -0800, kishore.sainath@gmail.com wrote: > Hi All, > > I have a database in PostgreSQL which is ASCII. > Due to some internationalization issues, I need to convert the database > to the UTF-8 format. > > So my question is: > How do I convert a database in the ASCII format into one of the UTF-8 > format? using pg_dump ? gnari
kishore.sainath@gmail.com wrote: > How do I convert a database in the ASCII format into one of the UTF-8 > format? ASCII is a subset of UTF-8, so you don't need to do anything. Just change the encoding entry in the pg_database table. Of course, using pg_dump would be the official way to convert a database between any two encodings. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On fös, 2006-02-17 at 22:38 +0100, Peter Eisentraut wrote: > kishore.sainath@gmail.com wrote: > > How do I convert a database in the ASCII format into one of the UTF-8 > > format? > > ASCII is a subset of UTF-8, so you don't need to do anything. Just > change the encoding entry in the pg_database table. Of course, using > pg_dump would be the official way to convert a database between any two > encodings. This will only work correctly if the database definitely does not contain non-ASCII characters. Assuming by ASCII format we mean that the database was created SQL_ASCII, then it is possible that it contains invalid UTF-8 characters, as SQL_ASCII is a 8 bit encoding. consider: template1=# create database test with encoding='SQL_ASCII'; CREATE DATABASE template1=# \connect test You are now connected to database "test". test=# create table a (x text); CREATE TABLE test=# insert into a values ('á'); INSERT 33304378 1 test=# select * from a; x --- á (1 row) test=# update pg_database set encoding = pg_catalog.pg_char_to_encoding('UTF8') where datname='test'; UPDATE 1 test=# select * from a; x --- á (1 row) test=# \connect template1 You are now connected to database "template1". template1=# \connect test You are now connected to database "test". test=# select * from a; x --- (1 row) test=# gnari
I have this exact problem. I have dumped and reloaded other databases and set the client encoding to convert them to UTF-8 but I have one database with values that still cause it to fail, even if I specify that the client encoding is SQL_ASCII. How do I fix that? On Feb 17, 2006, at 4:08 PM, Ragnar wrote: > On fös, 2006-02-17 at 22:38 +0100, Peter Eisentraut wrote: >> kishore.sainath@gmail.com wrote: >>> How do I convert a database in the ASCII format into one of the >>> UTF-8 >>> format? >> >> ASCII is a subset of UTF-8, so you don't need to do anything. Just >> change the encoding entry in the pg_database table. Of course, using >> pg_dump would be the official way to convert a database between >> any two >> encodings. > > This will only work correctly if the database > definitely does not contain non-ASCII characters. > > Assuming by ASCII format we mean that the database was > created SQL_ASCII, then it is possible that it contains > invalid UTF-8 characters, as SQL_ASCII is a 8 bit > encoding. > > consider: > > template1=# create database test with encoding='SQL_ASCII'; > CREATE DATABASE > template1=# \connect test > You are now connected to database "test". > test=# create table a (x text); > CREATE TABLE > test=# insert into a values ('á'); > INSERT 33304378 1 > test=# select * from a; > x > --- > á > (1 row) > > test=# update pg_database set encoding = > pg_catalog.pg_char_to_encoding('UTF8') where datname='test'; > UPDATE 1 > test=# select * from a; > x > --- > á > (1 row) > > test=# \connect template1 > You are now connected to database "template1". > template1=# \connect test > You are now connected to database "test". > test=# select * from a; > x > --- > > (1 row) > > test=# > > > gnari > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
I believe PostgreSQL treat UTF-8 and LATIN9 Differently.
When I tried to dump a db to a UTF-8 encoding and restore it
with UTF-8 encoding (also) it encountered problems with fields
that have unicoded values thus it stop from restoring the whole dump.
So I tried using LATIN9 encoding for both dump and restore.
I believe that LATIN9=UTF-8 encoding base on the DOCS.
I wonder why it is like that.
When I tried to dump a db to a UTF-8 encoding and restore it
with UTF-8 encoding (also) it encountered problems with fields
that have unicoded values thus it stop from restoring the whole dump.
So I tried using LATIN9 encoding for both dump and restore.
I believe that LATIN9=UTF-8 encoding base on the DOCS.
I wonder why it is like that.
Jan Cruz wrote: > I believe PostgreSQL treat UTF-8 and LATIN9 Differently. Certainly, considering that they are different encodings. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Rick Gigger wrote: > I have this exact problem. I have dumped and reloaded other > databases and set the client encoding to convert them to UTF-8 but I > have one database with values that still cause it to fail, even if I > specify that the client encoding is SQL_ASCII. How do I fix that? Well then you need to figure out what encoding those non-ASCII characters actually are and set the client encoding to that. Trying LATIN1 might be a good start, but if you users/applications have inserted random bytes then you will have to sort it out by hand. -- Peter Eisentraut http://developer.postgresql.org/~petere/