Home > mailing lists

Re: Unicode problem again - Mailing list pgsql-general

From	Michael Fuhr
Subject	Re: Unicode problem again
Date	June 26, 2008 11:41:26
Msg-id	20080626144107.GA75907@winnie.fuhr.org Whole thread Raw
In response to	Re: Unicode problem again ("Albe Laurenz" <laurenz.albe@wien.gv.at>)
Responses	Re: Unicode problem again
List	pgsql-general

Tree view

On Thu, Jun 26, 2008 at 03:31:01PM +0200, Albe Laurenz wrote:
> Michael Fuhr wrote:
> > Your input data seems to have a mix of encodings: sometimes you're
> > getting pound signs in a non-UTF-8 encoding, but if characters like
> > <U+2019 RIGHT SINGLE QUOTATION MARK> got into the database when
> > client_encoding was set to UTF8 then at least some data must have
> > been in UTF-8.
>
> Sorry, but that's not true.
> That character is 0x9s in WINDOWS-1252.

I think you mean 0x92.

> So it could have been that client_encoding was (correctly) set to WIN1252
> and the quotation mark was entered as a single byte character.

Yes, *if* client_encoding was set to win1252.  However, in the
following thread Garry said that he was getting encoding errors
when entering the pound sign that were resolved by changing
client_encoding (I suggested latin1, latin9, or win1252; he doesn't
say which he used):

http://archives.postgresql.org/pgsql-general/2008-06/msg00526.php

If client_encoding had been set to win1252 then Garry wouldn't have
gotten encoding errors when entering the pound sign because that
character is 0xa3 in win1252 (also in latin1 and latin9). So either
applications are setting client_encoding to different values,
sometimes correctly and sometimes incorrectly (Garry, do you know
if that could be happening?), or the data is sometimes in different
encodings.  If the data is being entered via a web application then
the latter seems more likely, at least in my experience (I've had
to deal with exactly this problem recently).

--
Michael Fuhr

pgsql-general by date:

From: Lincoln Yeoh
Date: 26 June 2008, 10:34:28
Subject: Re: Probably been asked a hundred times before.

From: Tom Lane
Date: 26 June 2008, 12:13:31
Subject: Re: Problem with FOUND

Re: Unicode problem again - Mailing list pgsql-general

Previous

Next