On 8/15/07, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> On 15/08/07, Ivan Zolotukhin <ivan.zolotukhin@gmail.com> wrote:
> > Hello,
> >
> > Actually I tried smth like $str = @iconv("UTF-8", "UTF-8//IGNORE",
> > $str); when preparing string for SQL query and it worked. There's
> > probably a better way in PHP to achieve this: simply change default
> > values in php.ini for these parameters:
> >
> > mbstring.encoding_translation = On
> > mbstring.substitute_character = none
> >
> > and broken symbols will be automatically stripped off from the input
> > and output.
>
>
> Sadly, they don't always do that, not with Asian scripts.
>
> And I do not completely agree, like the other poster suggested, with
> the concept of GIGO. Sometimes you want the end-user's experience to
> be seamless. For example, in one of our web sites, we allow users to
> submit text through a bookmarklet, where the title of the webpage
> comes in rawurlencoded format. We try to rawurldecode() it on our end
> but most of the times the Asian interpretation is wrong. We have all
> the usual mbstring settings in php.ini. In this scenario, the user did
> not enter any garbage. Our application should have the ability to
> recognize the text. We do what we can with mb_convert...etc, but the
> database just throws an error.
>
> PGSQL really needs to get with the program when it comes to utf-8 input.
What, exactly, does that mean?
That PostgreSQL should take things in invalid utf-8 format and just store them?
Or that PostgreSQL should autoconvert from invalid utf-8 to valid
utf-8, guessing the proper codes?
Seriously, what do you want pgsql to do with these invalid inputs?