Thread: Migrate postgres databases from SQL_ASCII to UNICODE encoding
A simple question, we need to migrate many (>20) postgres databases from SQL_ASCII encoding to UNICODE encoding, over a 7.3.6 server.
With Dump/Restore , we get an error (Invalid Unicode) in any field that has a 8 bits character coming from the SQL_ASCII , even setting the client_encoding to WIN, ISO-8859-1, and others encodings.
I'm really worried about this problem since , we have many DBs to migrate and even with a ODBC program we can't port tables from Postgresql/SQL_ASCII to Postgresql/UNICODE.
Any suggestion will be *very* appreciated.
Regards,
Dario.
"Dario V. Fassi" <software@sistemat.com.ar> writes: > A simple question, we need to migrate many (>20) postgres databases from > SQL_ASCII encoding to UNICODE encoding, over a 7.3.6 server. SQL_ASCII is not an encoding (it's more like the absence of knowledge about an encoding). What is the data actually stored as? > With Dump/Restore , we get an error (Invalid Unicode) in any field that > has a 8 bits character coming from the SQL_ASCII , even setting the > client_encoding to WIN, ISO-8859-1, and others encodings. It might work to just UPDATE pg_database to set datencoding to the correct value reflecting what you have actually stored. You might then need to REINDEX any indexes on textual columns, but I don't think anything else would go wrong. If you have a mishmash of different encodings in a single database, then of course there is no simple solution; you are in for some pain while you try to fix the data. regards, tom lane
Tom Lane wrote:
Yes you are right , the original data come from a DB2 with CodePage IBM-850 and was inserted without complains in a Postgres 7.3.6 with SQL_ASCII.
Now we are in a Jail , because IBM-850 , isn't WIN, isn't ISO-xx , isn't no one postgresql's encoding.
So when in change via pg_databases the encoding , 8 bits characters become garbage.
More even if we accept this garbage chars and we set encoding to e.g. ISO-8859-1 it's impossible go to a UNICODE because this garbage chars are invalid in client's encoding , so they are reject (in translation process as invalid unicode chars).
We are in a big problem, and the only way out I can imagine is fix the data by hand :-! .
Dario,
"Dario V. Fassi" <software@sistemat.com.ar> writes:A simple question, we need to migrate many (>20) postgres databases from SQL_ASCII encoding to UNICODE encoding, over a 7.3.6 server.SQL_ASCII is not an encoding (it's more like the absence of knowledge about an encoding). What is the data actually stored as?With Dump/Restore , we get an error (Invalid Unicode) in any field that has a 8 bits character coming from the SQL_ASCII , even setting the client_encoding to WIN, ISO-8859-1, and others encodings.It might work to just UPDATE pg_database to set datencoding to the correct value reflecting what you have actually stored. You might then need to REINDEX any indexes on textual columns, but I don't think anything else would go wrong. If you have a mishmash of different encodings in a single database, then of course there is no simple solution; you are in for some pain while you try to fix the data.
Yes you are right , the original data come from a DB2 with CodePage IBM-850 and was inserted without complains in a Postgres 7.3.6 with SQL_ASCII.
Now we are in a Jail , because IBM-850 , isn't WIN, isn't ISO-xx , isn't no one postgresql's encoding.
So when in change via pg_databases the encoding , 8 bits characters become garbage.
More even if we accept this garbage chars and we set encoding to e.g. ISO-8859-1 it's impossible go to a UNICODE because this garbage chars are invalid in client's encoding , so they are reject (in translation process as invalid unicode chars).
We are in a big problem, and the only way out I can imagine is fix the data by hand :-! .
Dario,
If you pg_dump in ASCII format, could you figure out what has happened to the 8-bit characters, if they are recognizable garbage (not truncated to some genuine character) you then write a sed script. It by hand, but it doesn't sound terrible.
"Dario V. Fassi" <software@sistemat.com.ar> writes: >> SQL_ASCII is not an encoding (it's more like the absence of knowledge >> about an encoding). What is the data actually stored as? > Yes you are right , the original data come from a DB2 with CodePage > IBM-850 and was inserted without complains in a Postgres 7.3.6 with > SQL_ASCII. Ugh. You'll have to work out how to convert that codepage to one of the encodings that PG supports. Or else add it as a supported encoding (I'm not sure how hard that is, but it's not out of the question). regards, tom lane
Tom Lane wrote:
Ok, I'm interested is do that , but someone is the development group would give me some assistence to get a start point to work with."Dario V. Fassi" <software@sistemat.com.ar> writes:SQL_ASCII is not an encoding (it's more like the absence of knowledge about an encoding). What is the data actually stored as?Yes you are right , the original data come from a DB2 with CodePage IBM-850 and was inserted without complains in a Postgres 7.3.6 with SQL_ASCII.Ugh. You'll have to work out how to convert that codepage to one of the encodings that PG supports. Or else add it as a supported encoding (I'm not sure how hard that is, but it's not out of the question). regards, tom lane
I program in Java, C, and less frequently in C++.
Dario.
Dario V. Fassi a écrit : > > Tom Lane wrote: > >>"Dario V. Fassi" <software@sistemat.com.ar> writes: >> >> >>>>SQL_ASCII is not an encoding (it's more like the absence of knowledge >>>>about an encoding). What is the data actually stored as? >>>> >>>> >>>Yes you are right , the original data come from a DB2 with CodePage >>>IBM-850 and was inserted without complains in a Postgres 7.3.6 with >>>SQL_ASCII. >>> >>> >> >>Ugh. You'll have to work out how to convert that codepage to one of the >>encodings that PG supports. Or else add it as a supported encoding >>(I'm not sure how hard that is, but it's not out of the question). >> did you try the recode utility ?
What we've done in the past is just run the dump file through iconv - this has support for IBM-850. http://www.gnu.org/software/libiconv/documentation/libiconv/iconv.1.html Hope this helps, Michael Dario V. Fassi wrote: > > Tom Lane wrote: > >>"Dario V. Fassi" <software@sistemat.com.ar> writes: >> >> >>>>SQL_ASCII is not an encoding (it's more like the absence of knowledge >>>>about an encoding). What is the data actually stored as? >>>> >>>> >>>Yes you are right , the original data come from a DB2 with CodePage >>>IBM-850 and was inserted without complains in a Postgres 7.3.6 with >>>SQL_ASCII. >>> >>> >> >>Ugh. You'll have to work out how to convert that codepage to one of the >>encodings that PG supports. Or else add it as a supported encoding >>(I'm not sure how hard that is, but it's not out of the question). >> >> regards, tom lane >> >> > Ok, I'm interested is do that , but someone is the development group > would give me some assistence to get a start point to work with. > I program in Java, C, and less frequently in C++. > > Dario. > > -- Web Applications Developer Open World Ltd, 11 Riverside Court, Riverside Road, Bath, BA2 3DZ. Tel: +44 1225 444950 Fax: +44 1225 336738 http://www.openworld.co.uk/ CONFIDENTIALITY NOTICE The information contained in this message is confidential, intended only for the use of the individual or the entity named as recipient. If the reader of this message is not that recipient, you are notified that any dissemination, distribution or copy of this message is strictly prohibited. If you have received this message in error, please immediately notify us by telephone on the number above. Your co-operation is appreciated.