Thread: Unicode Corruption and upgrading to 8.0.4. to 8.1
Hi everyone, I have a problem with corrupt UTF-8 sequences in my 8.0.4 dump which is preventing me from upgrading to 8.1 - which spots the errors and refuses to import the data. Is there some SQL command that I can use to fix or cauterise the sequences in the 8.0.4 database before dumping to 8.1? I think the problem arose using invalid client encodings - which were not rejected prior to 8.1. Regards, Howard Cole www.selestial.com
Have you tried to restore just schema first, then data? Greetings, Zlatko ----- Original Message ----- From: "Howard Cole" <howardnews@selestial.com> To: "'PgSql General'" <pgsql-general@postgresql.org> Sent: Friday, December 02, 2005 3:02 PM Subject: [GENERAL] Unicode Corruption and upgrading to 8.0.4. to 8.1 > Hi everyone, I have a problem with corrupt UTF-8 sequences in my 8.0.4 > dump which is preventing me from upgrading to 8.1 - which spots the > errors and refuses to import the data. Is there some SQL command that I > can use to fix or cauterise the sequences in the 8.0.4 database before > dumping to 8.1? > > I think the problem arose using invalid client encodings - which were > not rejected prior to 8.1. > > Regards, > > Howard Cole > www.selestial.com > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
Hi Zlatko, I shall give this a try later and let you know how I get on. Thank you for responding. Howard. Zlatko Matic wrote: > Have you tried to restore just schema first, then data? > Greetings, > > Zlatko > >> Hi everyone, I have a problem with corrupt UTF-8 sequences in my >> 8.0.4 dump which is preventing me from upgrading to 8.1 - which spots >> the errors and refuses to import the data. Is there some SQL command >> that I can use to fix or cauterise the sequences in the 8.0.4 >> database before dumping to 8.1? >> >> I think the problem arose using invalid client encodings - which were >> not rejected prior to 8.1. >>
Hello! > -----Ursprüngliche Nachricht----- > Von: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] Im Auftrag von Howard Cole > Gesendet: Dienstag, 6. Dezember 2005 13:41 > An: 'PgSql General' > Betreff: Re: [GENERAL] Unicode Corruption and upgrading to > 8.0.4. to 8.1 > >> Hi everyone, I have a problem with corrupt UTF-8 sequences in my > >> 8.0.4 dump which is preventing me from upgrading to 8.1 - > which spots > >> the errors and refuses to import the data. Is there some > SQL command > >> that I can use to fix or cauterise the sequences in the 8.0.4 > >> database before dumping to 8.1? > >> > >> I think the problem arose using invalid client encodings - > which were > >> not rejected prior to 8.1. We experienced the exact same problems. You may solve the problem by feeding the dump through iconv. See my earlier messageon this issue http://archives.postgresql.org/pgsql-general/2005-11/msg00799.php On top of that you'd be well advised to try dumping using pg_dump of postgresql 8.1. Kind regards Markus
Thanks Markus, I am avoiding this solution at the moment since the database contains binary (ByteA) fields aswell as text fields and I am unsure what iconv would do to this data. If Zlatko's method does not work then I shall see if I can programmatically use libiconv for all the relevant data. Regards, Howard Cole Markus Wollny wrote: >message on this issue > >http://archives.postgresql.org/pgsql-general/2005-11/msg00799.php > >On top of that you'd be well advised to try dumping using pg_dump of postgresql 8.1. > > >
Hi! > -----Ursprüngliche Nachricht----- > Von: Howard Cole [mailto:howardnews@selestial.com] > Gesendet: Dienstag, 6. Dezember 2005 15:38 > An: Markus Wollny > Cc: PgSql General > Betreff: Re: [GENERAL] Unicode Corruption and upgrading to > 8.0.4. to 8.1 > I am avoiding this solution at the moment since the database > contains binary (ByteA) fields aswell as text fields and I am > unsure what iconv would do to this data. Bytea-data in a plain text dump should be quite safe from iconv, as all the problematic characters (decimal value <32 or>126) in the binary string are represented as SQL escaped octets like \###. Kind regards Markus