Re: 8.0, UTF8, and CLIENT_ENCODING - Mailing list pgsql-general

From Paul Ramsey
Subject Re: 8.0, UTF8, and CLIENT_ENCODING
Date
Msg-id D84BEF92-179D-4197-A686-FA80DA8B7961@refractions.net
Whole thread Raw
In response to Re: 8.0, UTF8, and CLIENT_ENCODING  (Michael Glaesemann <grzm@seespotcode.net>)
List pgsql-general
Thanks all for the information. Summary is:

- 8.0 wasn't very strict, and allowed the illegal values in, instead
of mapping them over into UTF-8 space
- the values can be stripped with iconv -c
- 8.2 should be more strict

I'm in the midst of my upgrade to 8.2 now, hopefully the LATIN1->UTF8
conversion will now map the odd characters cleanly into UTF space.

On 17-May-07, at 3:25 PM, Michael Glaesemann wrote:

>
> On May 17, 2007, at 16:47 , PFC wrote:
>
>>> and put that in the form. Instead of being mapped to 2-byte UTF8
>>> high-bit equivalents, they are going into the database directly
>>> as one-byte values > 127. That is, as illegal UTF8 values.
>>
>>     Sometimes you also get HTML entities in the mix. Who knows.
>>     All my web forms are UTF-8 back to back, it just works. Was I
>> lucky ?
>>     Normally postgres rejects illegal UTF8 values, you wouldn't be
>> able to insert them...
>
> 8.0 and earlier weren't quite as strict as it should have been. See
> the note at the end of the migration instuctions in the release
> notes for 8.1[1] That may have been part of the issue here.
>
> Michael Glaesemann
> grzm seespotcode net
>
> [1](http://www.postgresql.org/docs/8.2/interactive/
> release-8-1.html#AEN80196)


pgsql-general by date:

Previous
From: "George Pavlov"
Date:
Subject: Re: Privs on deleted objects
Next
From: "Michael Nolan"
Date:
Subject: Re: Large Database Restore