Thread: Can't dump and restore

Can't dump and restore

From
"Peter Darley"
Date:
Folks,

    I'm having some trouble restoring data that was dumped from my database.
When I dump out a table (pg_dump -d neo -t question > question.tbl) and try
to restore it (psql -d temp -f question.tbl) I get errors on certain rows:
"psql:question.tbl:15861: ERROR:  invalid byte sequence for encoding
"UNICODE": 0xe96520".  This happens when I use SQL_ASCII or UNICODE as the
encoding.  I didn't try any other encodings.

    When I do this with a table, it's missing the rows that throw the error.
When I do it with a full database dump, the tables that have rows that throw
the errors are empty (I assume it's just cancleing the transaction).

    What is confounding me is that this is data that is currently in the
database, unless the dump is creating errors as it goes, which doesn't seem
likely.  It is always the same rows as well.  Why would it be rejecting data
that it obvously accepted once?

    The computer is running FC4 64 SMP.  The version() of PostgreSQL is: "Linux
version 2.6.13-1.1526_FC4smp (bhcompile@hs20-bc1-1.build.redhat.com) (gcc
version 4.0.1 20050727 (Red Hat 4.0.1-5)) #1 SMP Wed Sep 28 19:28:24 EDT
2005"

    Does anyone have any idea of what might be going on here?  Is this likely
to be something that I can fix, or should I start taring up the directory
when I need to do backups?

Thanks,
Peter Darley


Re: Can't dump and restore

From
Tom Lane
Date:
"Peter Darley" <pdarley@kinesis-cem.com> writes:
>     I'm having some trouble restoring data that was dumped from my database.
> When I dump out a table (pg_dump -d neo -t question > question.tbl) and try
> to restore it (psql -d temp -f question.tbl) I get errors on certain rows:
> "psql:question.tbl:15861: ERROR:  invalid byte sequence for encoding
> "UNICODE": 0xe96520".  This happens when I use SQL_ASCII or UNICODE as the
> encoding.  I didn't try any other encodings.

Er, what is the encoding of the source database, exactly?  Is it the
same as the encoding of the destination database?

SQL_ASCII disables all encoding checks, so it's entirely plausible that
you would have byte sequences that are not legal Unicode in a SQL_ASCII
database.  If so, there's not much to do except manually clean up the
bogus data.

Also, if you're trying to restore into PG 8.1 from an older version,
we've fixed some mistakes in the Unicode encoding checker, so there
actually are differences in what the code will accept :-(

            regards, tom lane

Re: Can't dump and restore

From
"Peter Darley"
Date:
Tom,
    As far as I can tell the source and destination dbs are SQL_ASCII, but I
have to admit I'm not sure how to find out.  When I dump just the schema for
the cluster, I get the following create database statements in the dump:

CREATE DATABASE neo WITH TEMPLATE = template0 OWNER = postgres ENCODING =
'SQL_ASCII';

    So it looks to me like it's using 'SQL_ASCII'.

    The dump I got from 7.? when I upgraded restored into 8.1 just fine.  This
is a dump from the 8.1.  I've done some tests this morning where I just dump
a table, create a temp database and try to restore the table into it.  I've
tried creating the db with different encodings and tried changing the line
"SET client_encoding = 'SQL_ASCII';" in the dump to "SET client_encoding =
'UNICODE';", and they have the same results.

    So, long and short, I think that everything should be SQL_ASCII, but even
when it is, I get the error.

Thanks,
Peter Darley

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, November 01, 2005 1:22 PM
To: Peter Darley
Cc: Pgsql-Admin
Subject: Re: [ADMIN] Can't dump and restore


"Peter Darley" <pdarley@kinesis-cem.com> writes:
>     I'm having some trouble restoring data that was dumped from my database.
> When I dump out a table (pg_dump -d neo -t question > question.tbl) and
try
> to restore it (psql -d temp -f question.tbl) I get errors on certain rows:
> "psql:question.tbl:15861: ERROR:  invalid byte sequence for encoding
> "UNICODE": 0xe96520".  This happens when I use SQL_ASCII or UNICODE as the
> encoding.  I didn't try any other encodings.

Er, what is the encoding of the source database, exactly?  Is it the
same as the encoding of the destination database?

SQL_ASCII disables all encoding checks, so it's entirely plausible that
you would have byte sequences that are not legal Unicode in a SQL_ASCII
database.  If so, there's not much to do except manually clean up the
bogus data.

Also, if you're trying to restore into PG 8.1 from an older version,
we've fixed some mistakes in the Unicode encoding checker, so there
actually are differences in what the code will accept :-(

            regards, tom lane


Re: Can't dump and restore

From
Tom Lane
Date:
"Peter Darley" <pdarley@kinesis-cem.com> writes:
>     So, long and short, I think that everything should be SQL_ASCII, but even
> when it is, I get the error.

There is no way you're going to get that error if the encoding setting
is SQL_ASCII, so better look again.  "SHOW server_encoding" and "SHOW
client_encoding" might be revealing.

            regards, tom lane

Re: Can't dump and restore

From
"Peter Darley"
Date:
Tom,

    You're correct, the new database was being created as UNICODE instead of
SQL_ASCII.  I'd never messed with encoding before, but I guess that the
default was changed some time after I originally made the database on 6.x.
Stupid mistake. :)

Thanks a bundle for your help!
Peter Darley

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, November 01, 2005 1:38 PM
To: Peter Darley
Cc: Pgsql-Admin
Subject: Re: [ADMIN] Can't dump and restore


"Peter Darley" <pdarley@kinesis-cem.com> writes:
>     So, long and short, I think that everything should be SQL_ASCII, but even
> when it is, I get the error.

There is no way you're going to get that error if the encoding setting
is SQL_ASCII, so better look again.  "SHOW server_encoding" and "SHOW
client_encoding" might be revealing.

            regards, tom lane