Thread: Can't dump and restore
Folks, I'm having some trouble restoring data that was dumped from my database. When I dump out a table (pg_dump -d neo -t question > question.tbl) and try to restore it (psql -d temp -f question.tbl) I get errors on certain rows: "psql:question.tbl:15861: ERROR: invalid byte sequence for encoding "UNICODE": 0xe96520". This happens when I use SQL_ASCII or UNICODE as the encoding. I didn't try any other encodings. When I do this with a table, it's missing the rows that throw the error. When I do it with a full database dump, the tables that have rows that throw the errors are empty (I assume it's just cancleing the transaction). What is confounding me is that this is data that is currently in the database, unless the dump is creating errors as it goes, which doesn't seem likely. It is always the same rows as well. Why would it be rejecting data that it obvously accepted once? The computer is running FC4 64 SMP. The version() of PostgreSQL is: "Linux version 2.6.13-1.1526_FC4smp (bhcompile@hs20-bc1-1.build.redhat.com) (gcc version 4.0.1 20050727 (Red Hat 4.0.1-5)) #1 SMP Wed Sep 28 19:28:24 EDT 2005" Does anyone have any idea of what might be going on here? Is this likely to be something that I can fix, or should I start taring up the directory when I need to do backups? Thanks, Peter Darley
"Peter Darley" <pdarley@kinesis-cem.com> writes: > I'm having some trouble restoring data that was dumped from my database. > When I dump out a table (pg_dump -d neo -t question > question.tbl) and try > to restore it (psql -d temp -f question.tbl) I get errors on certain rows: > "psql:question.tbl:15861: ERROR: invalid byte sequence for encoding > "UNICODE": 0xe96520". This happens when I use SQL_ASCII or UNICODE as the > encoding. I didn't try any other encodings. Er, what is the encoding of the source database, exactly? Is it the same as the encoding of the destination database? SQL_ASCII disables all encoding checks, so it's entirely plausible that you would have byte sequences that are not legal Unicode in a SQL_ASCII database. If so, there's not much to do except manually clean up the bogus data. Also, if you're trying to restore into PG 8.1 from an older version, we've fixed some mistakes in the Unicode encoding checker, so there actually are differences in what the code will accept :-( regards, tom lane
Tom, As far as I can tell the source and destination dbs are SQL_ASCII, but I have to admit I'm not sure how to find out. When I dump just the schema for the cluster, I get the following create database statements in the dump: CREATE DATABASE neo WITH TEMPLATE = template0 OWNER = postgres ENCODING = 'SQL_ASCII'; So it looks to me like it's using 'SQL_ASCII'. The dump I got from 7.? when I upgraded restored into 8.1 just fine. This is a dump from the 8.1. I've done some tests this morning where I just dump a table, create a temp database and try to restore the table into it. I've tried creating the db with different encodings and tried changing the line "SET client_encoding = 'SQL_ASCII';" in the dump to "SET client_encoding = 'UNICODE';", and they have the same results. So, long and short, I think that everything should be SQL_ASCII, but even when it is, I get the error. Thanks, Peter Darley -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, November 01, 2005 1:22 PM To: Peter Darley Cc: Pgsql-Admin Subject: Re: [ADMIN] Can't dump and restore "Peter Darley" <pdarley@kinesis-cem.com> writes: > I'm having some trouble restoring data that was dumped from my database. > When I dump out a table (pg_dump -d neo -t question > question.tbl) and try > to restore it (psql -d temp -f question.tbl) I get errors on certain rows: > "psql:question.tbl:15861: ERROR: invalid byte sequence for encoding > "UNICODE": 0xe96520". This happens when I use SQL_ASCII or UNICODE as the > encoding. I didn't try any other encodings. Er, what is the encoding of the source database, exactly? Is it the same as the encoding of the destination database? SQL_ASCII disables all encoding checks, so it's entirely plausible that you would have byte sequences that are not legal Unicode in a SQL_ASCII database. If so, there's not much to do except manually clean up the bogus data. Also, if you're trying to restore into PG 8.1 from an older version, we've fixed some mistakes in the Unicode encoding checker, so there actually are differences in what the code will accept :-( regards, tom lane
"Peter Darley" <pdarley@kinesis-cem.com> writes: > So, long and short, I think that everything should be SQL_ASCII, but even > when it is, I get the error. There is no way you're going to get that error if the encoding setting is SQL_ASCII, so better look again. "SHOW server_encoding" and "SHOW client_encoding" might be revealing. regards, tom lane
Tom, You're correct, the new database was being created as UNICODE instead of SQL_ASCII. I'd never messed with encoding before, but I guess that the default was changed some time after I originally made the database on 6.x. Stupid mistake. :) Thanks a bundle for your help! Peter Darley -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, November 01, 2005 1:38 PM To: Peter Darley Cc: Pgsql-Admin Subject: Re: [ADMIN] Can't dump and restore "Peter Darley" <pdarley@kinesis-cem.com> writes: > So, long and short, I think that everything should be SQL_ASCII, but even > when it is, I get the error. There is no way you're going to get that error if the encoding setting is SQL_ASCII, so better look again. "SHOW server_encoding" and "SHOW client_encoding" might be revealing. regards, tom lane