Re: character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2" - Mailing list pgsql-general

From Andreas Kalsch
Subject Re: character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2"
Date
Msg-id 4A783164.9040804@gmx.de
Whole thread Raw
In response to Re: character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2"  (Alban Hertroys <dalroi@solfertje.student.utwente.nl>)
Responses Re: character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2"  (Alban Hertroys <dalroi@solfertje.student.utwente.nl>)
List pgsql-general
Alban,

what I do to simplify the data chain:

HTTP encoding > PHP string encoding > client connection > server - all
is UTF8. Plus invalid byte check in PHP (or server).

What I have tested inside Postgres is entering a 3 byte UTF8 character
to this function. And I have got an error. This is a character I will
not filter out, if some Unicode artists will enter it. It is an
international website and the simplification is just for indexing.

But I think that this will not solve the problem and I have to use
Python or Perl to get it done.


Alban Hertroys schrieb:
> On 4 Aug 2009, at 24:57, Andreas Kalsch wrote:
>
>>> I think the real problem is: Where do you lose the original encoding
>>> the users input their data with? If you specify that encoding on the
>>> connection and send it to a database that can handle UTF-8 then you
>>> shouldn't be getting any conversion problems in the first place.
>> Nowhere - I will validate input data on the client side (PHP or
>> Python) and send it to the server. Of course the only encoding I will
>> use on any side is UTF8. I just wnated to use this Latin thing for
>> simplification of characters.
>
> Yes you are. How could your users input invalid characters in the
> first place if that were not the case? You're not suggesting they
> managed to enter characters in an encoding for which they weren't
> valid on their own systems, do you?[1]
>
> You say your client is using PHP or Python, which suggests it's a
> website. That means the input goes like this: web browser -> website
> -> database. All three of those steps use some encoding and you can
> take them into account. That should prevent this problem altogether.
>
> You have control over which encoding your client and the database use,
> and the web browser tells what encoding it used in the POST request so
> you can pass that along to the database when storing data or convert
> it in your client.
>
> [1] There exists of course a small group of people who enjoy posting
> raw byte data to a web-form, but would it matter whether they'd get an
> error about their encoding or not? They do not intend to enter valid
> data after all ;)
>
> Alban Hertroys
>
> --
> If you can't see the forest for the trees,
> cut the trees and you'll see there is no forest.
>
>
> !DSPAM:933,4a7820e310131447310801!
>
>
>


pgsql-general by date:

Previous
From: Harald Fuchs
Date:
Subject: Re: Refer to another database
Next
From: Sam Mason
Date:
Subject: Re: parameters in functions and overlap with names of columns