Re: Mule internal code ? - Mailing list pgsql-hackers
From | Patrice Hédé |
---|---|
Subject | Re: Mule internal code ? |
Date | |
Msg-id | 20011010194614.K14587@idf.net Whole thread Raw |
In response to | Re: Mule internal code ? (Tatsuo Ishii <t-ishii@sra.co.jp>) |
Responses |
Re: Mule internal code ?
|
List | pgsql-hackers |
* Tatsuo Ishii <t-ishii@sra.co.jp> [011010 18:20]: > > As said in another mail, I have tried to add iso-8859-15 (Latin 9) & > > iso-8859-16 (Latin 10) to PostgreSQL, I think I have done mostly all > > that's necessary. But I miss two things : > > ISO-8859-15 and 16! I don't know anything beyond ISO-8859-10. Can you > give me any pointer (URL) explaining what they are? http://www.evertype.com/sc2wg3.html It links to files describing iso-8859-14 to 16. 14 is gaelic support, which I've never seen used (of course, I don't speak irish, so that's probably why :) ), and it has nothing to do with the euro. 15 is a "modernised" version of iso-8859-1. It removes some not-so-widely used characters (currency place-holder, fraction characters), to replace them with the euro sign, the french oe, OE, and Y diaeresis, and the finnish/estonian s/S caron and z/Z caron. That's the official 8-bit charset for western europe now (btw, the other name is latin9, or latin0, as it's supposed to replace iso8859-1, which is now what should be called a legacy encoding). 16 is quite new. It's supposed to do the same as iso-8859-15, but for central europe countries. It had support for the euro sign, the romanian language (t comma below, s comma below), but I've read somewhere that it has lost support for two or three other central europe countries... go figure... > > - latin92mic/mic2latin9/latin102mic/mic2latin10 in conv.c > > - the leading character value in pg_wchar.h > > > > I don't know anything about MULE except that it's some Emacs standard > > (why they didn't go for Unicode is beyond my understanding, is > > off-topic on this list, and had probably some good argument at the > > time). > > Probably this is because Unicode is not perfect at all. For example, > the concept "encode everything in two-bytes" is obviously broken > down, some charsets (for example JIS X 0213) are not supported at all, > etc. etc... Well, for the history iso-10646 was 32 bits from the beginning, and Unicode didn't say that it was only 16 bits, though, to be fair, the Unicode consortium said it didn't believe it would need more than 16 bits. BTW, now, there is a statement that they wouldn't go above 0x10ffff, which gives a bit more than 1 million characters... I think it should be enough this time (but who knows !?). Regarding the *main* issue with Unicode, which is support of japanese kanji vs chinese (in the CJK unification), I must admit I don't know the details, but arguments of both sides seem to be valid. I must admit I would say "add the japanese version of the characters", since it's not lack of space which is the problem now. But things like this will get solved with time, and it really seems like Unicode will achieve the so much needed charset unity it's been made for :) > > Can someone point me to where I should look for that ? is it as > > easy as iso-8859-2/3/4 support, or do I need to do something as > > iso-8859-5 ? > > Docs for MULE internal code come with XEmacs. For example, see: > > ftp://ftp.xemacs.org/pub/xemacs/docs/letter/internals-letter.pdf.gz > > http://www.lns.cornell.edu/public/COMP/info/xemacs/internals/internals_15.html#SEC83 Unfortunately, these explain the principles behind mule, not the way to encode them from/to another character set :/ Patrice -- Patrice Hédé email: patrice hede à islande org www : http://www.islande.org/
pgsql-hackers by date: