Thread: Re: Migrate postgres databases from SQL_ASCII to UNICODE
Tom Lane wrote:
Yes you are right , the original data come from a DB2 with CodePage IBM-850 and was inserted without complains in a Postgres 7.3.6 with SQL_ASCII.
Now we are in a Jail , because IBM-850 , isn't WIN, isn't ISO-xx , isn't no one postgresql's encoding.
So when in change via pg_databases the encoding , 8 bits characters become garbage.
More even if we accept this garbage chars and we set encoding to e.g. ISO-8859-1 it's impossible go to a UNICODE because this garbage chars are invalid in client's encoding , so they are reject (in translation process as invalid unicode chars).
We are in a big problem, and the only way out I can imagine is fix the data by hand :-! .
Dario,
"Dario V. Fassi" <software@sistemat.com.ar> writes:A simple question, we need to migrate many (>20) postgres databases from SQL_ASCII encoding to UNICODE encoding, over a 7.3.6 server.SQL_ASCII is not an encoding (it's more like the absence of knowledge about an encoding). What is the data actually stored as?With Dump/Restore , we get an error (Invalid Unicode) in any field that has a 8 bits character coming from the SQL_ASCII , even setting the client_encoding to WIN, ISO-8859-1, and others encodings.It might work to just UPDATE pg_database to set datencoding to the correct value reflecting what you have actually stored. You might then need to REINDEX any indexes on textual columns, but I don't think anything else would go wrong. If you have a mishmash of different encodings in a single database, then of course there is no simple solution; you are in for some pain while you try to fix the data.
Yes you are right , the original data come from a DB2 with CodePage IBM-850 and was inserted without complains in a Postgres 7.3.6 with SQL_ASCII.
Now we are in a Jail , because IBM-850 , isn't WIN, isn't ISO-xx , isn't no one postgresql's encoding.
So when in change via pg_databases the encoding , 8 bits characters become garbage.
More even if we accept this garbage chars and we set encoding to e.g. ISO-8859-1 it's impossible go to a UNICODE because this garbage chars are invalid in client's encoding , so they are reject (in translation process as invalid unicode chars).
We are in a big problem, and the only way out I can imagine is fix the data by hand :-! .
Dario,
В Вск, 18.07.2004, в 01:34, Dario V. Fassi пишет: > Tom Lane wrote: > > "Dario V. Fassi" <software@sistemat.com.ar> writes: > > > > > A simple question, we need to migrate many (>20) postgres databases from > > > SQL_ASCII encoding to UNICODE encoding, over a 7.3.6 server. > > > > > SQL_ASCII is not an encoding (it's more like the absence of knowledge > > about an encoding). What is the data actually stored as? > > > > > With Dump/Restore , we get an error (Invalid Unicode) in any field that > > > has a 8 bits character coming from the SQL_ASCII , even setting the > > > client_encoding to WIN, ISO-8859-1, and others encodings. > > > > > It might work to just UPDATE pg_database to set datencoding to the > > correct value reflecting what you have actually stored. You might then > > need to REINDEX any indexes on textual columns, but I don't think > > anything else would go wrong. > > > > If you have a mishmash of different encodings in a single database, then > > of course there is no simple solution; you are in for some pain while > > you try to fix the data. > > > > Yes you are right , the original data come from a DB2 with CodePage > IBM-850 and was inserted without complains in a Postgres 7.3.6 with > SQL_ASCII. > > Now we are in a Jail , because IBM-850 , isn't WIN, isn't ISO-xx , > isn't no one postgresql's encoding. > So when in change via pg_databases the encoding , 8 bits characters > become garbage. > More even if we accept this garbage chars and we set encoding to e.g. > ISO-8859-1 it's impossible go to a UNICODE because this garbage > chars are invalid in client's encoding , so they are reject (in > translation process as invalid unicode chars). You can dump the data, convert it with for example iconv and then load the data again. -- Markus Bertheau <twanger@bluetwanger.de>
Hi, If you have blobs you must split your SQL_ASCII to convert it with iconv. I export my full database. I create a new SQL_ASCII database with only tables that don't contains blobs. (db_without_blobs) I create a new SQL_ASCII database with only tables that contains blobs. (db_with_blobs) Export your db_without_blobs. convert it with iconv Create your UNICODE database. import db_without_blobs. import db_with_blobs. It's done. 1) export your database. /usr/local/postgres/bin/pg_dump -Fc -v -b -d -U YOUR_LOGIN YOUR_DATABASE > full_database.dump 2) create work database, without blobs. /usr/local/postgres/bin/createdb db_without_blobs 3) import all datas /usr/local/postgres/bin/pg_restore -d db_without_blobs -Fc -v -U YOUR_LOGIN full_database.dump 4) create work database, with blobs. /usr/local/postgres/bin/createdb db_with_blobs 5) import all datas. /usr/local/postgres/bin/pg_restore -d db_with_blobs -Fc -v -U YOUR_LOGIN full_database.dump 6) Connect to db_without_blobs and drop all tables with blobs. 6') Connect to db_with_blobs and drop all tables without blobs. 7) export datas in text mode : /usr/local/postgres/bin/pg_dump -Fp -v -i -D -U YOUR_LOGIN db_without_blobs > db_without_blobs.dump 8) convert into UTF-8 iconv --from-code=ISO-8859-1 --to-code=UTF-8 -o db_without_blobs-utf.dump db_without_blobs.dump 9) Create unicode database. /usr/local/postgres/bin/createdb -E UNICODE db_unicode 10) importat UTF-8 datas /usr/local/postgres/bin/psql db_unicode < db_without_blobs.dump 11) export datas with blobs in binary format. 12) import this dump info db_unicode . Best Regards, Patrice Trognon. > Ð ÐÑк, 18.07.2004, в 01:34, Dario V. Fassi пиÑеÑ: > > Tom Lane wrote: > > > "Dario V. Fassi" <software@sistemat.com.ar> writes: > > > > > > > A simple question, we need to migrate many (>20) postgres databases from > > > > SQL_ASCII encoding to UNICODE encoding, over a 7.3.6 server. > > > > > > > SQL_ASCII is not an encoding (it's more like the absence of knowledge > > > about an encoding). What is the data actually stored as? > > > > > > > With Dump/Restore , we get an error (Invalid Unicode) in any field that > > > > has a 8 bits character coming from the SQL_ASCII , even setting the > > > > client_encoding to WIN, ISO-8859-1, and others encodings. > > > > > > > It might work to just UPDATE pg_database to set datencoding to the > > > correct value reflecting what you have actually stored. You might then > > > need to REINDEX any indexes on textual columns, but I don't think > > > anything else would go wrong. > > > > > > If you have a mishmash of different encodings in a single database, then > > > of course there is no simple solution; you are in for some pain while > > > you try to fix the data. > > > > > > > Yes you are right , the original data come from a DB2 with CodePage > > IBM-850 and was inserted without complains in a Postgres 7.3.6 with > > SQL_ASCII. > > > > Now we are in a Jail , because IBM-850 , isn't WIN, isn't ISO-xx , > > isn't no one postgresql's encoding. > > So when in change via pg_databases the encoding , 8 bits characters > > become garbage. > > More even if we accept this garbage chars and we set encoding to e.g. > > ISO-8859-1 it's impossible go to a UNICODE because this garbage > > chars are invalid in client's encoding , so they are reject (in > > translation process as invalid unicode chars). > > You can dump the data, convert it with for example iconv and then load > the data again. > > -- > Markus Bertheau <twanger@bluetwanger.de> > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
Thanks very much to all. Dario. Patrice Trognon wrote: >Hi, > >If you have blobs you must split your SQL_ASCII to convert it >with iconv. >I export my full database. >I create a new SQL_ASCII database with only tables that don't >contains blobs. (db_without_blobs) >I create a new SQL_ASCII database with only tables that contains >blobs. (db_with_blobs) >Export your db_without_blobs. >convert it with iconv >Create your UNICODE database. >import db_without_blobs. >import db_with_blobs. > >It's done. > >1) export your database. >/usr/local/postgres/bin/pg_dump -Fc -v -b -d -U YOUR_LOGIN YOUR_DATABASE > full_database.dump > >2) create work database, without blobs. >/usr/local/postgres/bin/createdb db_without_blobs > >3) import all datas >/usr/local/postgres/bin/pg_restore -d db_without_blobs -Fc -v -U YOUR_LOGIN full_database.dump > >4) create work database, with blobs. >/usr/local/postgres/bin/createdb db_with_blobs > >5) import all datas. >/usr/local/postgres/bin/pg_restore -d db_with_blobs -Fc -v -U YOUR_LOGIN full_database.dump > >6) Connect to db_without_blobs and drop all tables with blobs. >6') Connect to db_with_blobs and drop all tables without blobs. > >7) export datas in text mode : >/usr/local/postgres/bin/pg_dump -Fp -v -i -D -U YOUR_LOGIN db_without_blobs > db_without_blobs.dump > >8) convert into UTF-8 >iconv --from-code=ISO-8859-1 --to-code=UTF-8 -o db_without_blobs-utf.dump db_without_blobs.dump > >9) Create unicode database. >/usr/local/postgres/bin/createdb -E UNICODE db_unicode > >10) importat UTF-8 datas >/usr/local/postgres/bin/psql db_unicode < db_without_blobs.dump > >11) export datas with blobs in binary format. >12) import this dump info db_unicode . > >Best Regards, > >Patrice Trognon. > > > > > > > > >>В В�к, 18.07.2004, в 01:34, Dario V. Fassi пишет: >> >> >>>Tom Lane wrote: >>> >>> >>>>"Dario V. Fassi" <software@sistemat.com.ar> writes: >>>> >>>> >>>> >>>>>A simple question, we need to migrate many (>20) postgres databases from >>>>>SQL_ASCII encoding to UNICODE encoding, over a 7.3.6 server. >>>>> >>>>> >>>>> >>>>SQL_ASCII is not an encoding (it's more like the absence of knowledge >>>>about an encoding). What is the data actually stored as? >>>> >>>> >>>> >>>>>With Dump/Restore , we get an error (Invalid Unicode) in any field that >>>>>has a 8 bits character coming from the SQL_ASCII , even setting the >>>>>client_encoding to WIN, ISO-8859-1, and others encodings. >>>>> >>>>> >>>>> >>>>It might work to just UPDATE pg_database to set datencoding to the >>>>correct value reflecting what you have actually stored. You might then >>>>need to REINDEX any indexes on textual columns, but I don't think >>>>anything else would go wrong. >>>> >>>>If you have a mishmash of different encodings in a single database, then >>>>of course there is no simple solution; you are in for some pain while >>>>you try to fix the data. >>>> >>>> >>>> >>>Yes you are right , the original data come from a DB2 with CodePage >>>IBM-850 and was inserted without complains in a Postgres 7.3.6 with >>>SQL_ASCII. >>> >>>Now we are in a Jail , because IBM-850 , isn't WIN, isn't ISO-xx , >>>isn't no one postgresql's encoding. >>>So when in change via pg_databases the encoding , 8 bits characters >>>become garbage. >>>More even if we accept this garbage chars and we set encoding to e.g. >>>ISO-8859-1 it's impossible go to a UNICODE because this garbage >>>chars are invalid in client's encoding , so they are reject (in >>>translation process as invalid unicode chars). >>> >>> >>You can dump the data, convert it with for example iconv and then load >>the data again. >> >>-- >>Markus Bertheau <twanger@bluetwanger.de> >> >>