Re: Best practice for: ERROR: invalid byte sequence for encoding "UTF8" - Mailing list pgsql-general

From Scott Marlowe
Subject Re: Best practice for: ERROR: invalid byte sequence for encoding "UTF8"
Date
Msg-id dcc563d10708151031j6b128165ic72256e99cf0c916@mail.gmail.com
Whole thread Raw
In response to Re: Best practice for: ERROR: invalid byte sequence for encoding "UTF8"  ("Phoenix Kiula" <phoenix.kiula@gmail.com>)
Responses Re: Best practice for: ERROR: invalid byte sequence for encoding "UTF8"  ("Phoenix Kiula" <phoenix.kiula@gmail.com>)
List pgsql-general
On 8/15/07, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> On 15/08/07, Ivan Zolotukhin <ivan.zolotukhin@gmail.com> wrote:
> > Hello,
> >
> > Actually I tried smth like $str = @iconv("UTF-8", "UTF-8//IGNORE",
> > $str); when preparing string for SQL query and it worked. There's
> > probably a better way in PHP to achieve this: simply change default
> > values in php.ini for these parameters:
> >
> > mbstring.encoding_translation = On
> > mbstring.substitute_character = none
> >
> > and broken symbols will be automatically stripped off from the input
> > and output.
>
>
> Sadly, they don't always do that, not with Asian scripts.
>
> And I do not completely agree, like the other poster suggested, with
> the concept of GIGO. Sometimes you want the end-user's experience to
> be seamless. For example, in one of our web sites, we allow users to
> submit text through a bookmarklet, where the title of the webpage
> comes in rawurlencoded format. We try to rawurldecode() it on our end
> but most of the times the Asian interpretation is wrong. We have all
> the usual mbstring settings in php.ini. In this scenario, the user did
> not enter any garbage. Our application should have the ability to
> recognize the text. We do what we can with mb_convert...etc, but the
> database just throws an error.
>
> PGSQL really needs to get with the program when it comes to utf-8 input.

What, exactly, does that mean?

That PostgreSQL should take things in invalid utf-8 format and just store them?
Or that PostgreSQL should autoconvert from invalid utf-8 to valid
utf-8, guessing the proper codes?

Seriously, what do you want pgsql to do with these invalid inputs?

pgsql-general by date:

Previous
From: Naz Gassiep
Date:
Subject: User-Friendly TimeZone List
Next
From: "Phoenix Kiula"
Date:
Subject: Re: Best practice for: ERROR: invalid byte sequence for encoding "UTF8"