Thread: Mule internal code ?
Hi, As said in another mail, I have tried to add iso-8859-15 (Latin 9) & iso-8859-16 (Latin 10) to PostgreSQL, I think I have done mostly all that's necessary. But I miss two things : - latin92mic/mic2latin9/latin102mic/mic2latin10 in conv.c - the leading character value in pg_wchar.h I don't know anything about MULE except that it's some Emacs standard (why they didn't go for Unicode is beyond my understanding, is off-topic on this list, and had probably some good argument at the time). Can someone point me to where I should look for that ? is it as easy as iso-8859-2/3/4 support, or do I need to do something as iso-8859-5 ? Thank you :) Patrice. -- Patrice HÉDÉ ------------------------------- patrice à islande org ----- -- Isn't it weird how scientists can imagine all the matter of the universe exploding out of a dot smaller than the head of a pin, but they can't come up with a more evocative name for it than "The Big Bang" ? -- What would _you_ call the creation of the universe? -- "The HORRENDOUS SPACE KABLOOIE !" - Calvin and Hobbes ------------------------------------------ http://www.islande.org/ -----
> As said in another mail, I have tried to add iso-8859-15 (Latin 9) & > iso-8859-16 (Latin 10) to PostgreSQL, I think I have done mostly all > that's necessary. But I miss two things : ISO-8859-15 and 16! I don't know anything beyond ISO-8859-10. Can you give me any pointer (URL) explaining what they are? > - latin92mic/mic2latin9/latin102mic/mic2latin10 in conv.c > - the leading character value in pg_wchar.h > > I don't know anything about MULE except that it's some Emacs standard > (why they didn't go for Unicode is beyond my understanding, is > off-topic on this list, and had probably some good argument at the > time). Probably this is because Unicode is not perfect at all. For example, the concept "encode everything in two-bytes" is obviously broken down, some charsets (for example JIS X 0213) are not supported at all, etc. etc... > Can someone point me to where I should look for that ? is it as easy > as iso-8859-2/3/4 support, or do I need to do something as iso-8859-5 ? Docs for MULE internal code come with XEmacs. For example, see: ftp://ftp.xemacs.org/pub/xemacs/docs/letter/internals-letter.pdf.gz http://www.lns.cornell.edu/public/COMP/info/xemacs/internals/internals_15.html#SEC83 etc. -- Tatsuo Ishii
* Tatsuo Ishii <t-ishii@sra.co.jp> [011010 18:20]: > > As said in another mail, I have tried to add iso-8859-15 (Latin 9) & > > iso-8859-16 (Latin 10) to PostgreSQL, I think I have done mostly all > > that's necessary. But I miss two things : > > ISO-8859-15 and 16! I don't know anything beyond ISO-8859-10. Can you > give me any pointer (URL) explaining what they are? http://www.evertype.com/sc2wg3.html It links to files describing iso-8859-14 to 16. 14 is gaelic support, which I've never seen used (of course, I don't speak irish, so that's probably why :) ), and it has nothing to do with the euro. 15 is a "modernised" version of iso-8859-1. It removes some not-so-widely used characters (currency place-holder, fraction characters), to replace them with the euro sign, the french oe, OE, and Y diaeresis, and the finnish/estonian s/S caron and z/Z caron. That's the official 8-bit charset for western europe now (btw, the other name is latin9, or latin0, as it's supposed to replace iso8859-1, which is now what should be called a legacy encoding). 16 is quite new. It's supposed to do the same as iso-8859-15, but for central europe countries. It had support for the euro sign, the romanian language (t comma below, s comma below), but I've read somewhere that it has lost support for two or three other central europe countries... go figure... > > - latin92mic/mic2latin9/latin102mic/mic2latin10 in conv.c > > - the leading character value in pg_wchar.h > > > > I don't know anything about MULE except that it's some Emacs standard > > (why they didn't go for Unicode is beyond my understanding, is > > off-topic on this list, and had probably some good argument at the > > time). > > Probably this is because Unicode is not perfect at all. For example, > the concept "encode everything in two-bytes" is obviously broken > down, some charsets (for example JIS X 0213) are not supported at all, > etc. etc... Well, for the history iso-10646 was 32 bits from the beginning, and Unicode didn't say that it was only 16 bits, though, to be fair, the Unicode consortium said it didn't believe it would need more than 16 bits. BTW, now, there is a statement that they wouldn't go above 0x10ffff, which gives a bit more than 1 million characters... I think it should be enough this time (but who knows !?). Regarding the *main* issue with Unicode, which is support of japanese kanji vs chinese (in the CJK unification), I must admit I don't know the details, but arguments of both sides seem to be valid. I must admit I would say "add the japanese version of the characters", since it's not lack of space which is the problem now. But things like this will get solved with time, and it really seems like Unicode will achieve the so much needed charset unity it's been made for :) > > Can someone point me to where I should look for that ? is it as > > easy as iso-8859-2/3/4 support, or do I need to do something as > > iso-8859-5 ? > > Docs for MULE internal code come with XEmacs. For example, see: > > ftp://ftp.xemacs.org/pub/xemacs/docs/letter/internals-letter.pdf.gz > > http://www.lns.cornell.edu/public/COMP/info/xemacs/internals/internals_15.html#SEC83 Unfortunately, these explain the principles behind mule, not the way to encode them from/to another character set :/ Patrice -- Patrice Hédé email: patrice hede à islande org www : http://www.islande.org/
> > ISO-8859-15 and 16! I don't know anything beyond ISO-8859-10. Can you > > give me any pointer (URL) explaining what they are? > > http://www.evertype.com/sc2wg3.html > > It links to files describing iso-8859-14 to 16. [snip] Thanks for the info. > Well, for the history iso-10646 was 32 bits from the beginning, and > Unicode didn't say that it was only 16 bits, though, to be fair, the > Unicode consortium said it didn't believe it would need more than 16 > bits. > > BTW, now, there is a statement that they wouldn't go above 0x10ffff, > which gives a bit more than 1 million characters... I think it should > be enough this time (but who knows !?). > > Regarding the *main* issue with Unicode, which is support of japanese > kanji vs chinese (in the CJK unification), I must admit I don't know > the details, but arguments of both sides seem to be valid. I must > admit I would say "add the japanese version of the characters", since > it's not lack of space which is the problem now. But things like this > will get solved with time, and it really seems like Unicode will > achieve the so much needed charset unity it's been made for :) IMHO we should not rely on particular encodings/charsets, including Unicode (or ISO 10646), MULE internal code or whatever. My plan for supporting CREATE CHARCTER SET etc. stuffs would be truly *neutral* to any encodings/charsets. > > > Can someone point me to where I should look for that ? is it as > > > easy as iso-8859-2/3/4 support, or do I need to do something as > > > iso-8859-5 ? > > > > Docs for MULE internal code come with XEmacs. For example, see: > > > > ftp://ftp.xemacs.org/pub/xemacs/docs/letter/internals-letter.pdf.gz > > > > http://www.lns.cornell.edu/public/COMP/info/xemacs/internals/internals_15.html#SEC83 > > Unfortunately, these explain the principles behind mule, not the way > to encode them from/to another character set :/ Please take look at "15.3.1 Internal String Encoding." -- Tatsuo Ishii