Thread: UTF8 encoding problem
I am getting illegal UTF8 encoding errors and I have traced it to the £ sign. I have set lc_monetary to "lc_monetary = 'en_GB.UTF-8'" in postgresql.conf but this has no effect. How can I sort this problem? Client_encoding =UTF8. Regards Garry
On Tue, Jun 17, 2008 at 10:48:34PM +0100, Garry Saddington wrote: > I am getting illegal UTF8 encoding errors and I have traced it to the £ sign. What's the exact error message? > I have set lc_monetary to "lc_monetary = 'en_GB.UTF-8'" in postgresql.conf but > this has no effect. How can I sort this problem? Client_encoding =UTF8. Is the data UTF-8? If the error is 'invalid byte sequence for encoding "UTF8": 0xa3' then you probably need to set client_encoding to latin1, latin9, or win1252. -- Michael Fuhr
On Wednesday 18 June 2008 02:04, Michael Fuhr wrote: > On Tue, Jun 17, 2008 at 10:48:34PM +0100, Garry Saddington wrote: > > I am getting illegal UTF8 encoding errors and I have traced it to the £ > > sign. > > What's the exact error message? > > > I have set lc_monetary to "lc_monetary = 'en_GB.UTF-8'" in > > postgresql.conf but this has no effect. How can I sort this problem? > > Client_encoding =UTF8. > > Is the data UTF-8? If the error is 'invalid byte sequence for encoding > "UTF8": 0xa3' then you probably need to set client_encoding to latin1, > latin9, or win1252. > Thanks, that's fixed it. Garry
On 18/giu/08, at 03:04, Michael Fuhr wrote: > On Tue, Jun 17, 2008 at 10:48:34PM +0100, Garry Saddington wrote: >> I am getting illegal UTF8 encoding errors and I have traced it to >> the £ sign. > > What's the exact error message? > >> I have set lc_monetary to "lc_monetary = 'en_GB.UTF-8'" in >> postgresql.conf but >> this has no effect. How can I sort this problem? Client_encoding >> =UTF8. > > Is the data UTF-8? If the error is 'invalid byte sequence for > encoding > "UTF8": 0xa3' then you probably need to set client_encoding to latin1, > latin9, or win1252. Why? -- Giorgio Valoti
On Wed, Jun 18, 2008 at 08:25:07AM +0200, Giorgio Valoti wrote: > On 18/giu/08, at 03:04, Michael Fuhr wrote: > > Is the data UTF-8? If the error is 'invalid byte sequence for > > encoding "UTF8": 0xa3' then you probably need to set client_encoding > > to latin1, latin9, or win1252. > > Why? UTF-8 has rules about what byte values can occur in sequence; violations of those rules mean that the data isn't valid UTF-8. This particular error says that the database received a byte with the value 0xa3 (163) in a sequence of bytes that wasn't valid UTF-8. The UTF-8 byte sequence for the pound sign (£) is 0xc2 0xa3. If Garry got this error (I don't know if he did; I was asking) then the byte 0xa3 must have appeared in some other sequence that wasn't valid UTF-8. The usual reason for that is that the data is in some encoding other than UTF-8. Common encodings for Western European languages are Latin-1 (ISO-8859-1), Latin-9 (ISO-8859-15), and Windows-1252. All three of these encodings use a lone 0xa3 to represent the pound sign. If the data has a pound sign as 0xa3 and the database complains that it isn't part of a valid UTF-8 sequence then the data is likely to be in one of these other encodings. -- Michael Fuhr
On Wednesday 18 June 2008 14:00, Michael Fuhr wrote: > On Wed, Jun 18, 2008 at 08:25:07AM +0200, Giorgio Valoti wrote: > > On 18/giu/08, at 03:04, Michael Fuhr wrote: > > > Is the data UTF-8? If the error is 'invalid byte sequence for > > > encoding "UTF8": 0xa3' then you probably need to set client_encoding > > > to latin1, latin9, or win1252. > > > > Why? > > UTF-8 has rules about what byte values can occur in sequence; > violations of those rules mean that the data isn't valid UTF-8. > This particular error says that the database received a byte with > the value 0xa3 (163) in a sequence of bytes that wasn't valid UTF-8. > > The UTF-8 byte sequence for the pound sign (£) is 0xc2 0xa3. If > Garry got this error (I don't know if he did; I was asking) then > the byte 0xa3 must have appeared in some other sequence that wasn't > valid UTF-8. The usual reason for that is that the data is in some > encoding other than UTF-8. > > Common encodings for Western European languages are Latin-1 > (ISO-8859-1), Latin-9 (ISO-8859-15), and Windows-1252. All three > of these encodings use a lone 0xa3 to represent the pound sign. If > the data has a pound sign as 0xa3 and the database complains that > it isn't part of a valid UTF-8 sequence then the data is likely to > be in one of these other encodings. > Thanks, I have traced it to a client_encoding problem and set it to latin1 which has cured the problem. regards garry
On 18/giu/08, at 15:00, Michael Fuhr wrote: > On Wed, Jun 18, 2008 at 08:25:07AM +0200, Giorgio Valoti wrote: >> On 18/giu/08, at 03:04, Michael Fuhr wrote: >>> Is the data UTF-8? If the error is 'invalid byte sequence for >>> encoding "UTF8": 0xa3' then you probably need to set client_encoding >>> to latin1, latin9, or win1252. >> >> Why? > > UTF-8 has rules about what byte values can occur in sequence; > violations of those rules mean that the data isn't valid UTF-8. > This particular error says that the database received a byte with > the value 0xa3 (163) in a sequence of bytes that wasn't valid UTF-8. > > The UTF-8 byte sequence for the pound sign (£) is 0xc2 0xa3. If > Garry got this error (I don't know if he did; I was asking) then > the byte 0xa3 must have appeared in some other sequence that wasn't > valid UTF-8. The usual reason for that is that the data is in some > encoding other than UTF-8. > > Common encodings for Western European languages are Latin-1 > (ISO-8859-1), Latin-9 (ISO-8859-15), and Windows-1252. All three > of these encodings use a lone 0xa3 to represent the pound sign. If > the data has a pound sign as 0xa3 and the database complains that > it isn't part of a valid UTF-8 sequence then the data is likely to > be in one of these other encodings. Much clearer now, thank you Michael. -- Giorgio Valoti