On Wed, Aug 10, 2005 at 11:17:40AM -0400, Alvaro Herrera wrote:
> On Wed, Aug 10, 2005 at 10:04:23AM +0200, Martijn van Oosterhout wrote:
>
> > Comments welcome. I can write more, if people can suggest things to
> > write about. I was thinking something about collation and locales but
> > I'm sure sure I understand them myself.
>
> I'd really love to see a Q&A for encodings, recoding, and "I see strange
> characters." Not sure how to phrase the question though.
I think you could write a whole section just on them and all the issues
on various platforms. But having never dealt with a system with
multiple languages / encodings I'm not sure I really understand the
issues. You know, like:
Encoding / character sets gotchas / recommendations:
Languages:
Asian
European
Programming:
Perl
Python
Java
ODBC
Regular expressions
Full text indexing
etc...
Platforms:
Windows
UNIX
etc...
The main thing I wonder about is does UTF-8 handle all characters
anybody would want to use. I've been told it doesn't for Asian
languages, in which case I don't see how this is a solvable problem
anyway.
I've collected quite a few comments from other people, so I'll post a
slightly revised patch later.
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.