Thread: Re: [ADMIN] Migrate postgres databases from SQL_ASCII to UNICODE
Tom Lane wrote:
Yes you are right , the original data come from a DB2 with CodePage IBM-850 and was inserted without complains in a Postgres 7.3.6 with SQL_ASCII.
Now we are in a Jail , because IBM-850 , isn't WIN, isn't ISO-xx , isn't no one postgresql's encoding.
So when in change via pg_databases the encoding , 8 bits characters become garbage.
More even if we accept this garbage chars and we set encoding to e.g. ISO-8859-1 it's impossible go to a UNICODE because this garbage chars are invalid in client's encoding , so they are reject (in translation process as invalid unicode chars).
We are in a big problem, and the only way out I can imagine is fix the data by hand :-! .
Dario,
"Dario V. Fassi" <software@sistemat.com.ar> writes:A simple question, we need to migrate many (>20) postgres databases from SQL_ASCII encoding to UNICODE encoding, over a 7.3.6 server.SQL_ASCII is not an encoding (it's more like the absence of knowledge about an encoding). What is the data actually stored as?With Dump/Restore , we get an error (Invalid Unicode) in any field that has a 8 bits character coming from the SQL_ASCII , even setting the client_encoding to WIN, ISO-8859-1, and others encodings.It might work to just UPDATE pg_database to set datencoding to the correct value reflecting what you have actually stored. You might then need to REINDEX any indexes on textual columns, but I don't think anything else would go wrong. If you have a mishmash of different encodings in a single database, then of course there is no simple solution; you are in for some pain while you try to fix the data.
Yes you are right , the original data come from a DB2 with CodePage IBM-850 and was inserted without complains in a Postgres 7.3.6 with SQL_ASCII.
Now we are in a Jail , because IBM-850 , isn't WIN, isn't ISO-xx , isn't no one postgresql's encoding.
So when in change via pg_databases the encoding , 8 bits characters become garbage.
More even if we accept this garbage chars and we set encoding to e.g. ISO-8859-1 it's impossible go to a UNICODE because this garbage chars are invalid in client's encoding , so they are reject (in translation process as invalid unicode chars).
We are in a big problem, and the only way out I can imagine is fix the data by hand :-! .
Dario,