Hello,
Well, PostgreSQL is correct entirely, I would post this message to the
-hackers list otherwise :) The question was rather about application
processing of user input not about change of database reaction on
broken UTF-8 string. But I am 100% sure one should fix the input in
this case since web site user can see some bad error (even if
application caught this SQL exception for instance) otherwise.
--
Regards,
Ivan
On 8/15/07, Martijn van Oosterhout <kleptog@svana.org> wrote:
> On Wed, Aug 15, 2007 at 03:41:30PM +0400, Ivan Zolotukhin wrote:
> > Hello,
> >
> > Imagine a web application that process text search queries from
> > clients. If one types a text search query in a browser it then sends
> > proper UTF-8 characters and application after all needed processing
> > (escaping, checks, etc) passes it to database. But if one modifies URL
> > of the query adding some trash non-UTF-8 characters, database raises
> > an error: invalid byte sequence for encoding "UTF8".
> >
> > What is the best practice to process such a broken strings before
> > passing them to PostgreSQL? Iconv from utf-8 to utf-8 dropping bad
> > characters?
>
> Well, the query as given by the user is invalid, so returning an error
> message complaining about the invalid byte sequence seems entirely
> reasonable.
>
> I don't see any reason to try and be smart. There's no way you can
> "fix" the query.
>
> Have a nice day,
> --
> Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> > From each according to his ability. To each according to his ability to litigate.
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
>
> iD8DBQFGwuvTIB7bNG8LQkwRAudJAJ9c8gvUQ25/S54gtJAPdqMOd81pNwCfUeLi
> JoWU92WJKZ1qM3UMRG5Zn0Y=
> =dPLv
> -----END PGP SIGNATURE-----
>
>