Thread: Migrate postgres databases from SQL_ASCII to UNICODE encoding

Migrate postgres databases from SQL_ASCII to UNICODE encoding

From
"Dario V. Fassi"
Date:

A simple question, we need to migrate many (>20) postgres databases from SQL_ASCII encoding to UNICODE encoding, over a 7.3.6 server.

With  Dump/Restore , we get an error (Invalid Unicode) in any field that has a 8 bits character coming from the SQL_ASCII , even setting the client_encoding to WIN, ISO-8859-1,  and others encodings.

I'm really worried about this problem since , we have many DBs to migrate and even with a ODBC program we can't port tables from Postgresql/SQL_ASCII to Postgresql/UNICODE.

Any suggestion will be *very* appreciated.

Regards,
Dario.


"Dario V. Fassi" <software@sistemat.com.ar> writes:
> A simple question, we need to migrate many (>20) postgres databases from
> SQL_ASCII encoding to UNICODE encoding, over a 7.3.6 server.

SQL_ASCII is not an encoding (it's more like the absence of knowledge
about an encoding).  What is the data actually stored as?

> With  Dump/Restore , we get an error (Invalid Unicode) in any field that
> has a 8 bits character coming from the SQL_ASCII , even setting the
> client_encoding to WIN, ISO-8859-1,  and others encodings.

It might work to just UPDATE pg_database to set datencoding to the
correct value reflecting what you have actually stored.  You might then
need to REINDEX any indexes on textual columns, but I don't think
anything else would go wrong.

If you have a mishmash of different encodings in a single database, then
of course there is no simple solution; you are in for some pain while
you try to fix the data.

            regards, tom lane

Re: [ADMIN] Migrate postgres databases from SQL_ASCII to UNICODE

From
"Dario V. Fassi"
Date:
Tom Lane wrote:
"Dario V. Fassi" <software@sistemat.com.ar> writes: 
A simple question, we need to migrate many (>20) postgres databases from 
SQL_ASCII encoding to UNICODE encoding, over a 7.3.6 server.   
SQL_ASCII is not an encoding (it's more like the absence of knowledge
about an encoding).  What is the data actually stored as? 
With  Dump/Restore , we get an error (Invalid Unicode) in any field that 
has a 8 bits character coming from the SQL_ASCII , even setting the 
client_encoding to WIN, ISO-8859-1,  and others encodings.   
It might work to just UPDATE pg_database to set datencoding to the
correct value reflecting what you have actually stored.  You might then
need to REINDEX any indexes on textual columns, but I don't think
anything else would go wrong.

If you have a mishmash of different encodings in a single database, then
of course there is no simple solution; you are in for some pain while
you try to fix the data. 

Yes you are right , the original data come from a DB2 with CodePage IBM-850  and was inserted  without complains in a Postgres 7.3.6 with SQL_ASCII.

Now we are in a Jail , because IBM-850 , isn't WIN, isn't  ISO-xx , isn't no one postgresql's encoding.
So when in change via pg_databases the encoding , 8 bits characters become garbage.
More even if we accept this garbage chars and we set encoding to e.g. ISO-8859-1   it's impossible go to a UNICODE because this garbage chars are invalid in client's encoding , so they are reject (in translation process as invalid unicode chars).

We are in a big problem, and the only way out I can imagine is fix the data by hand  :-! .

Dario,

Re: [ADMIN] Migrate postgres databases from SQL_ASCII to UNICODE

From
"Andrew Lazarus"
Date:
If you pg_dump in ASCII format, could you figure out what has happened to
the 8-bit characters, if they are recognizable garbage (not truncated to
some genuine character) you then write a sed script. It by hand, but it
doesn't sound terrible.


Re: [ADMIN] Migrate postgres databases from SQL_ASCII to UNICODE

From
Tom Lane
Date:
"Dario V. Fassi" <software@sistemat.com.ar> writes:
>> SQL_ASCII is not an encoding (it's more like the absence of knowledge
>> about an encoding).  What is the data actually stored as?

> Yes you are right , the original data come from a DB2 with CodePage
> IBM-850  and was inserted  without complains in a Postgres 7.3.6 with
> SQL_ASCII.

Ugh.  You'll have to work out how to convert that codepage to one of the
encodings that PG supports.  Or else add it as a supported encoding
(I'm not sure how hard that is, but it's not out of the question).

            regards, tom lane

Re: [ADMIN] Migrate postgres databases from SQL_ASCII to UNICODE

From
"Dario V. Fassi"
Date:

Tom Lane wrote:
"Dario V. Fassi" <software@sistemat.com.ar> writes: 
SQL_ASCII is not an encoding (it's more like the absence of knowledge
about an encoding).  What is the data actually stored as?     
Yes you are right , the original data come from a DB2 with CodePage 
IBM-850  and was inserted  without complains in a Postgres 7.3.6 with 
SQL_ASCII.   
Ugh.  You'll have to work out how to convert that codepage to one of the
encodings that PG supports.  Or else add it as a supported encoding
(I'm not sure how hard that is, but it's not out of the question).
		regards, tom lane 
Ok, I'm interested is do that , but someone is the development group would give me some assistence to get a start point to work with.
I program in Java, C, and less frequently in C++.

Dario.


Re: [ADMIN] Migrate postgres databases from SQL_ASCII to UNICODE

From
Xavier Poinsard
Date:
Dario V. Fassi a écrit :
>
> Tom Lane wrote:
>
>>"Dario V. Fassi" <software@sistemat.com.ar> writes:
>>
>>
>>>>SQL_ASCII is not an encoding (it's more like the absence of knowledge
>>>>about an encoding).  What is the data actually stored as?
>>>>
>>>>
>>>Yes you are right , the original data come from a DB2 with CodePage
>>>IBM-850  and was inserted  without complains in a Postgres 7.3.6 with
>>>SQL_ASCII.
>>>
>>>
>>
>>Ugh.  You'll have to work out how to convert that codepage to one of the
>>encodings that PG supports.  Or else add it as a supported encoding
>>(I'm not sure how hard that is, but it's not out of the question).
>>
did you try the recode utility ?

Re: [ADMIN] Migrate postgres databases from SQL_ASCII to UNICODE

From
Michael Stephenson
Date:
What we've done in the past is just run the dump file through iconv - this has
support for IBM-850.

http://www.gnu.org/software/libiconv/documentation/libiconv/iconv.1.html

Hope this helps,

Michael

Dario V. Fassi wrote:
>
> Tom Lane wrote:
>
>>"Dario V. Fassi" <software@sistemat.com.ar> writes:
>>
>>
>>>>SQL_ASCII is not an encoding (it's more like the absence of knowledge
>>>>about an encoding).  What is the data actually stored as?
>>>>
>>>>
>>>Yes you are right , the original data come from a DB2 with CodePage
>>>IBM-850  and was inserted  without complains in a Postgres 7.3.6 with
>>>SQL_ASCII.
>>>
>>>
>>
>>Ugh.  You'll have to work out how to convert that codepage to one of the
>>encodings that PG supports.  Or else add it as a supported encoding
>>(I'm not sure how hard that is, but it's not out of the question).
>>
>>            regards, tom lane
>>
>>
> Ok, I'm interested is do that , but someone is the development group
> would give me some assistence to get a start point to work with.
> I program in Java, C, and less frequently in C++.
>
> Dario.
>
>


--
Web Applications Developer
Open World Ltd, 11 Riverside Court, Riverside Road, Bath, BA2 3DZ.
Tel: +44 1225 444950  Fax: +44 1225 336738  http://www.openworld.co.uk/

CONFIDENTIALITY NOTICE
The information contained in this message is confidential, intended only for
the use of the individual or the entity named as recipient. If the reader of
this message is not that recipient, you are notified that any dissemination,
distribution or copy of this message is strictly prohibited. If you have
received this message in error, please immediately notify us by telephone on
the number above. Your co-operation is appreciated.