Re: UTF8 encoding problem - Mailing list pgsql-general

From Garry Saddington
Subject Re: UTF8 encoding problem
Date
Msg-id 200806181653.15484.garry@schoolteachers.co.uk
Whole thread Raw
In response to Re: UTF8 encoding problem  (Michael Fuhr <mike@fuhr.org>)
List pgsql-general
On Wednesday 18 June 2008 14:00, Michael Fuhr wrote:
> On Wed, Jun 18, 2008 at 08:25:07AM +0200, Giorgio Valoti wrote:
> > On 18/giu/08, at 03:04, Michael Fuhr wrote:
> > > Is the data UTF-8?  If the error is 'invalid byte sequence for
> > > encoding "UTF8": 0xa3' then you probably need to set client_encoding
> > > to latin1, latin9, or win1252.
> >
> > Why?
>
> UTF-8 has rules about what byte values can occur in sequence;
> violations of those rules mean that the data isn't valid UTF-8.
> This particular error says that the database received a byte with
> the value 0xa3 (163) in a sequence of bytes that wasn't valid UTF-8.
>
> The UTF-8 byte sequence for the pound sign (£) is 0xc2 0xa3.  If
> Garry got this error (I don't know if he did; I was asking) then
> the byte 0xa3 must have appeared in some other sequence that wasn't
> valid UTF-8.  The usual reason for that is that the data is in some
> encoding other than UTF-8.
>
> Common encodings for Western European languages are Latin-1
> (ISO-8859-1), Latin-9 (ISO-8859-15), and Windows-1252.  All three
> of these encodings use a lone 0xa3 to represent the pound sign.  If
> the data has a pound sign as 0xa3 and the database complains that
> it isn't part of a valid UTF-8 sequence then the data is likely to
> be in one of these other encodings.
>
Thanks, I have traced it to a client_encoding problem and set it to latin1
which has cured the problem.
regards
garry

pgsql-general by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Need Help Recovering from Botched Upgrade Attempt
Next
From: Sam Mason
Date:
Subject: Understanding fsync (was: Need Help Recovering from Botched Upgrade Attempt)