Thanks all for the information. Summary is:
- 8.0 wasn't very strict, and allowed the illegal values in, instead
of mapping them over into UTF-8 space
- the values can be stripped with iconv -c
- 8.2 should be more strict
I'm in the midst of my upgrade to 8.2 now, hopefully the LATIN1->UTF8
conversion will now map the odd characters cleanly into UTF space.
On 17-May-07, at 3:25 PM, Michael Glaesemann wrote:
>
> On May 17, 2007, at 16:47 , PFC wrote:
>
>>> and put that in the form. Instead of being mapped to 2-byte UTF8
>>> high-bit equivalents, they are going into the database directly
>>> as one-byte values > 127. That is, as illegal UTF8 values.
>>
>> Sometimes you also get HTML entities in the mix. Who knows.
>> All my web forms are UTF-8 back to back, it just works. Was I
>> lucky ?
>> Normally postgres rejects illegal UTF8 values, you wouldn't be
>> able to insert them...
>
> 8.0 and earlier weren't quite as strict as it should have been. See
> the note at the end of the migration instuctions in the release
> notes for 8.1[1] That may have been part of the issue here.
>
> Michael Glaesemann
> grzm seespotcode net
>
> [1](http://www.postgresql.org/docs/8.2/interactive/
> release-8-1.html#AEN80196)