Thread: encoding problem at restore

encoding problem at restore

From

Bob Hunter

Date:

18 February 2007, 11:21:31

Hello,

I have just updated to postgres8.1 and have the
following problem. The first line of the "PostgreSQL
database dump"
says:

SET client_encoding = 'SQL_ASCII';

which is correct. However, the restore says:

ERROR:  invalid byte sequence for encoding "UTF8":
0xe02031
HINT:  This error can also happen if the byte sequence
does not match the encoding expected by the server,
which is controlled by "client_encoding".
CONTEXT:  COPY <tablename>, line 1270


There are two problems. The first is, why UTF8 at all,
given that the dump specifies SQL_ASCII? The second
is, that at line 1270 there are (unsurprisingly) only
ASCII  characters, so why is psql complaining at all?

Thank you.

P.S. I am not subscribed, so please Cc the answers to me.



____________________________________________________________________________________
Yahoo! Music Unlimited
Access over 1 million songs.
http://music.yahoo.com/unlimited

Re: encoding problem at restore

From

Michael Fuhr

Date:

18 February 2007, 15:22:31

On Sat, Feb 17, 2007 at 03:12:44AM -0800, Bob Hunter wrote:
> ERROR:  invalid byte sequence for encoding "UTF8":
> 0xe02031
> HINT:  This error can also happen if the byte sequence
> does not match the encoding expected by the server,
> which is controlled by "client_encoding".
> CONTEXT:  COPY <tablename>, line 1270
>
> There are two problems. The first is, why UTF8 at all,
> given that the dump specifies SQL_ASCII?

Probably because the database encoding is UTF-8.  You can check with
"SHOW server_encoding", or with \l in psql, or by running "psql -l"
from a shell prompt, etc.  With a client_encoding of SQL_ASCII no
conversion will be made, so if the data isn't already UTF-8 then you
get an error such as the above.

> The second is, that at line 1270 there are (unsurprisingly) only
> ASCII  characters, so why is psql complaining at all?

Are you sure you're looking at the right line?  The line number in
the error refers to the line of the COPY data, not to the line of
the input file or stream.  For example, if the COPY begins on line
67 of the dump file then line 1270 of the data would be line 1337
of the file.  If you look at the correct line you might find a
string like "à 1" (latin small letter a with grave, space, digit
one).

Try editing the client_encoding line to specify whatever encoding
the data is really in.  For Western European languages likely guesses
are LATIN1, LATIN9, or WIN1252 (especially the latter if the data
originated on Windows).  Alternatively, you could use a converter
like iconv or uconv to convert the file to UTF-8 before feeding
it to psql.

--
Michael Fuhr