Thread: Problem while restoring a database from SQL_ASCII to UTF-8

Problem while restoring a database from SQL_ASCII to UTF-8

From
"Manoj Agarwal"
Date:

Hi,

 

I have a database in SQL_ASCII Encoding format from Postgresql-7.4.19.  I have migrated to Postgresql-8.4.9 and to UTF-8 and want to restore this database in UTF-8 encoding.  I am facing problem in restoring it as UTF-8.  I have 29 tables in the database, out of which it is unable to restore 3 tables.  It gives the following error:

 

pg_restore: [archiver (db)] Error while PROCESSING TOC:

pg_restore: [archiver (db)] Error from TOC entry 83; 0 40257 TABLE DATA document postgres

pg_restore: [archiver (db)] COPY failed for table "document": ERROR:  invalid byte sequence for encoding "UTF8": 0xe92034

HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

CONTEXT:  COPY document, line 15

pg_restore: [archiver (db)] Error from TOC entry 88; 0 40271 TABLE DATA attribute postgres

pg_restore: [archiver (db)] COPY failed for table "attribute": ERROR:  invalid byte sequence for encoding "UTF8": 0xe90931

HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

CONTEXT:  COPY attribute, line 909

pg_restore: [archiver (db)] Error from TOC entry 90; 0 40277 TABLE DATA revision_state postgres

pg_restore: [archiver (db)] COPY failed for table "revision_state": ERROR:  invalid byte sequence for encoding "UTF8": 0xe9e96e

HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

CONTEXT:  COPY revision_state, line 363

 

I have checked these three tables and I found that these tables contain special characters as values for some attributes for some records.  I tried to use the procedure mentioned in the following link, which didn’t worked for me:

 

http://archives.postgresql.org/pgsql-admin/2004-07/msg00206.php

 

I also have blobs in one table in this database, but I don’t think/see any issues with them.  Is there a way to restore the database from SQL_ASCII to UTF-8?  Is there a way to restore the database as SQL_ASCII and then convert the Encoding to UTF-8?  What is the consistency level?  Or are there chances of getting garbage?

 

Kind Regards,

Manoj Agarwal

Re: Problem while restoring a database from SQL_ASCII to UTF-8

From
Adrian Klaver
Date:
On 06/06/2012 04:30 AM, Manoj Agarwal wrote:
> Hi,
>
> I have a database in SQL_ASCII Encoding format from Postgresql-7.4.19. I
> have migrated to Postgresql-8.4.9 and to UTF-8 and want to restore this
> database in UTF-8 encoding. I am facing problem in restoring it as
> UTF-8. I have 29 tables in the database, out of which it is unable to
> restore 3 tables. It gives the following error:
>


>
>
> I have checked these three tables and I found that these tables contain
> special characters as values for some attributes for some records. I
> tried to use the procedure mentioned in the following link, which didn’t
> worked for me:
>
> http://archives.postgresql.org/pgsql-admin/2004-07/msg00206.php
>
> I also have blobs in one table in this database, but I don’t think/see
> any issues with them. Is there a way to restore the database from
> SQL_ASCII to UTF-8? Is there a way to restore the database as SQL_ASCII
> and then convert the Encoding to UTF-8? What is the consistency level?
> Or are there chances of getting garbage?

SQL_ASCII is not an encoding, it is basically a way of saying encoding
ignored. Given that, outside knowledge of the encoding used for data
inserted into the database is needed to make the conversion to UTF8. In
other words do you have any idea of what encoding was used by the
clients that supplied data to the database?

>
> Kind Regards,
>
> Manoj Agarwal
>


--
Adrian Klaver
adrian.klaver@gmail.com

Re: Problem while restoring a database from SQL_ASCII to UTF-8

From
Peter Geoghegan
Date:
On 6 June 2012 14:12, Adrian Klaver <adrian.klaver@gmail.com> wrote:
> SQL_ASCII is not an encoding, it is basically a way of saying encoding
> ignored. Given that, outside knowledge of the encoding used for data
> inserted into the database is needed to make the conversion to UTF8. In
> other words do you have any idea of what encoding was used by the clients
> that supplied data to the database?

You're going to have to manage this process carefully. I believe the
best tool for the job is pgloader:

http://pgfoundry.org/projects/pgloader/

It's certainly not inconceivable that this sort of tool could
automagically deduce encoding (albeit often inconclusively) in much
the same manor as modern web browsers attempt to when they have no
alternative. I don't see much demand for that though.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: Problem while restoring a database from SQL_ASCII to UTF-8

From
"Manoj Agarwal"
Date:
Hi,

The encoding that is used by the clients that supplied data to the database
is ISO-8859-1.

Kind Regards,
Manoj Agarwal

-----Original Message-----
From: Adrian Klaver [mailto:adrian.klaver@gmail.com]
Sent: Wednesday, June 06, 2012 6:43 PM
To: Manoj Agarwal
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Problem while restoring a database from SQL_ASCII to
UTF-8

On 06/06/2012 04:30 AM, Manoj Agarwal wrote:
> Hi,
>
> I have a database in SQL_ASCII Encoding format from Postgresql-7.4.19.
> I have migrated to Postgresql-8.4.9 and to UTF-8 and want to restore
> this database in UTF-8 encoding. I am facing problem in restoring it
> as UTF-8. I have 29 tables in the database, out of which it is unable
> to restore 3 tables. It gives the following error:
>


>
>
> I have checked these three tables and I found that these tables
> contain special characters as values for some attributes for some
> records. I tried to use the procedure mentioned in the following link,
> which didn't worked for me:
>
> http://archives.postgresql.org/pgsql-admin/2004-07/msg00206.php
>
> I also have blobs in one table in this database, but I don't think/see
> any issues with them. Is there a way to restore the database from
> SQL_ASCII to UTF-8? Is there a way to restore the database as
> SQL_ASCII and then convert the Encoding to UTF-8? What is the consistency
level?
> Or are there chances of getting garbage?

SQL_ASCII is not an encoding, it is basically a way of saying encoding
ignored. Given that, outside knowledge of the encoding used for data
inserted into the database is needed to make the conversion to UTF8. In
other words do you have any idea of what encoding was used by the clients
that supplied data to the database?

>
> Kind Regards,
>
> Manoj Agarwal
>


--
Adrian Klaver
adrian.klaver@gmail.com


Re: Problem while restoring a database from SQL_ASCII to UTF-8

From
Adrian Klaver
Date:
On 06/06/2012 08:30 PM, Manoj Agarwal wrote:
> Hi,
>
> The encoding that is used by the clients that supplied data to the database
> is ISO-8859-1.

You said previously that you had 29 tables in the 7.4 database and three
of those had errors when upgrading to the 8.4 database. At what state is
the 8.4 database?:

1) None of the tables from the 7.4 database are in it.

or

2) 26 tables are in it and
  a) The schema for the three problem tables are there but not the data.
  b) Neither the table schema or the data for the three tables is there.


Where I am going with this is using client_encoding to temporarily set
the client_encoding to ISO88591 while restoring the data for those
tables/entire database. The easiest way I can think of to do that is for
a whole database restore is set the client_encoding in postgresql.conf,
do a pg_ctl reload, restore, unset the client_encoding, pg_ctl reload.
This would entail excluding other clients from the db for the duration.
There are other ways to do this depending on the answers above.

>
> Kind Regards,
> Manoj Agarwal


--
Adrian Klaver
adrian.klaver@gmail.com
.

Re: Problem while restoring a database from SQL_ASCII to UTF-8

From
Adrian Klaver
Date:
On 06/06/2012 08:30 PM, Manoj Agarwal wrote:
> Hi,
>
> The encoding that is used by the clients that supplied data to the database
> is ISO-8859-1.

I meant to ask in my previous post:

1) What are the special characters causing the problem?

2) What is the data type for the fields holding these characters?

>
> Kind Regards,
> Manoj Agarwal


--
Adrian Klaver
adrian.klaver@gmail.com

Re: Problem while restoring a database from SQL_ASCII to UTF-8

From
"Manoj Agarwal"
Date:
Hi,

I took a slightly different approach to fix the issue.  There is a CPAN
module called Encoding::FixLatin, that takes mixed encoding input and
produces UTF-8 output.  It can be downloaded from the following link:

http://search.cpan.org/dist/Encoding-FixLatin/lib/Encoding/FixLatin.pm

This module takes input which may contain characters in more than one
encoding and makes a best effort to convert them all to UTF-8 output.  I
took the dump from Postgresql 7.4.19 database where encoding was SQL_ASCII.
Then I created a new database in UTF-8 in Postgresql 8.4.9.  Then I
installed the above Perl module.  After installing the module, I used the
following command to restore SQL_ASCII database to UTF-8:
pg_restore -O db_dump_sql_ascii | fix_latin | psql -d db_utf

The conversion was done successfully without any errors.   The database was
restored as UTF-8.

Kind Regards,
Manoj Agarwal

-----Original Message-----
From: Adrian Klaver [mailto:adrian.klaver@gmail.com]
Sent: Friday, June 08, 2012 2:19 AM
To: Manoj Agarwal
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Problem while restoring a database from SQL_ASCII to
UTF-8

On 06/06/2012 08:30 PM, Manoj Agarwal wrote:
> Hi,
>
> The encoding that is used by the clients that supplied data to the
> database is ISO-8859-1.

I meant to ask in my previous post:

1) What are the special characters causing the problem?

2) What is the data type for the fields holding these characters?

>
> Kind Regards,
> Manoj Agarwal


--
Adrian Klaver
adrian.klaver@gmail.com