Thread: Unicode Corruption and upgrading to 8.0.4. to 8.1

Unicode Corruption and upgrading to 8.0.4. to 8.1

From
Howard Cole
Date:
Hi everyone, I have a problem with corrupt UTF-8 sequences in my 8.0.4
dump which is preventing me from upgrading to 8.1 - which spots the
errors and refuses to import the data. Is there some SQL command that I
can use to fix or cauterise the sequences in the 8.0.4 database before
dumping to 8.1?

I think the problem arose using invalid client encodings - which were
not rejected prior to 8.1.

Regards,

Howard Cole
www.selestial.com

Re: Unicode Corruption and upgrading to 8.0.4. to 8.1

From
"Zlatko Matic"
Date:
Have you tried to restore just schema first, then data?
Greetings,

Zlatko

----- Original Message -----
From: "Howard Cole" <howardnews@selestial.com>
To: "'PgSql General'" <pgsql-general@postgresql.org>
Sent: Friday, December 02, 2005 3:02 PM
Subject: [GENERAL] Unicode Corruption and upgrading to 8.0.4. to 8.1


> Hi everyone, I have a problem with corrupt UTF-8 sequences in my 8.0.4
> dump which is preventing me from upgrading to 8.1 - which spots the
> errors and refuses to import the data. Is there some SQL command that I
> can use to fix or cauterise the sequences in the 8.0.4 database before
> dumping to 8.1?
>
> I think the problem arose using invalid client encodings - which were
> not rejected prior to 8.1.
>
> Regards,
>
> Howard Cole
> www.selestial.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

Re: Unicode Corruption and upgrading to 8.0.4. to 8.1

From
Howard Cole
Date:
Hi Zlatko,

I shall give this a try later and let you know how I get on. Thank you
for responding.

Howard.

Zlatko Matic wrote:

> Have you tried to restore just schema first, then data?
> Greetings,
>
> Zlatko
>
>> Hi everyone, I have a problem with corrupt UTF-8 sequences in my
>> 8.0.4 dump which is preventing me from upgrading to 8.1 - which spots
>> the errors and refuses to import the data. Is there some SQL command
>> that I can use to fix or cauterise the sequences in the 8.0.4
>> database before dumping to 8.1?
>>
>> I think the problem arose using invalid client encodings - which were
>> not rejected prior to 8.1.
>>


Re: Unicode Corruption and upgrading to 8.0.4. to 8.1

From
"Markus Wollny"
Date:
Hello!

> -----Ursprüngliche Nachricht-----
> Von: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] Im Auftrag von Howard Cole
> Gesendet: Dienstag, 6. Dezember 2005 13:41
> An: 'PgSql General'
> Betreff: Re: [GENERAL] Unicode Corruption and upgrading to
> 8.0.4. to 8.1

> >> Hi everyone, I have a problem with corrupt UTF-8 sequences in my
> >> 8.0.4 dump which is preventing me from upgrading to 8.1 -
> which spots
> >> the errors and refuses to import the data. Is there some
> SQL command
> >> that I can use to fix or cauterise the sequences in the 8.0.4
> >> database before dumping to 8.1?
> >>
> >> I think the problem arose using invalid client encodings -
> which were
> >> not rejected prior to 8.1.


We experienced the exact same problems. You may solve the problem by feeding the dump through iconv. See my earlier
messageon this issue 

http://archives.postgresql.org/pgsql-general/2005-11/msg00799.php

On top of that you'd be well advised to try dumping using pg_dump of postgresql 8.1.

Kind regards

   Markus

Re: Unicode Corruption and upgrading to 8.0.4. to 8.1

From
Howard Cole
Date:
Thanks Markus,

I am avoiding this solution at the moment since the database contains
binary (ByteA) fields aswell as text fields and I am unsure what iconv
would do to this data. If Zlatko's method does not work then I shall see
if I can programmatically use libiconv for all the relevant data.

Regards,

Howard Cole
Markus Wollny wrote:

>message on this issue
>
>http://archives.postgresql.org/pgsql-general/2005-11/msg00799.php
>
>On top of that you'd be well advised to try dumping using pg_dump of postgresql 8.1.
>
>
>

Re: Unicode Corruption and upgrading to 8.0.4. to 8.1

From
"Markus Wollny"
Date:
Hi!

> -----Ursprüngliche Nachricht-----
> Von: Howard Cole [mailto:howardnews@selestial.com]
> Gesendet: Dienstag, 6. Dezember 2005 15:38
> An: Markus Wollny
> Cc: PgSql General
> Betreff: Re: [GENERAL] Unicode Corruption and upgrading to
> 8.0.4. to 8.1

> I am avoiding this solution at the moment since the database
> contains binary (ByteA) fields aswell as text fields and I am
> unsure what iconv would do to this data.

Bytea-data in a plain text dump should be quite safe from iconv, as all the problematic characters (decimal value <32
or>126) in the binary string are represented as SQL escaped octets like \###.  

Kind regards

   Markus