Re: Mule internal code ? - Mailing list pgsql-hackers

From Patrice Hédé
Subject Re: Mule internal code ?
Date
Msg-id 20011010194614.K14587@idf.net
Whole thread Raw
In response to Re: Mule internal code ?  (Tatsuo Ishii <t-ishii@sra.co.jp>)
Responses Re: Mule internal code ?
List pgsql-hackers
* Tatsuo Ishii <t-ishii@sra.co.jp> [011010 18:20]:
> > As said in another mail, I have tried to add iso-8859-15 (Latin 9) &
> > iso-8859-16 (Latin 10) to PostgreSQL, I think I have done mostly all
> > that's necessary. But I miss two things :
> 
> ISO-8859-15 and 16! I don't know anything beyond ISO-8859-10. Can you
> give me any pointer (URL) explaining what they are?

http://www.evertype.com/sc2wg3.html

It links to files describing iso-8859-14 to 16.

14 is gaelic support, which I've never seen used (of course, I don't
speak irish, so that's probably why :) ), and it has nothing to do
with the euro.

15 is a "modernised" version of iso-8859-1. It removes some
not-so-widely used characters (currency place-holder, fraction
characters), to replace them with the euro sign, the french oe, OE,
and Y diaeresis, and the finnish/estonian s/S caron and z/Z caron.

That's the official 8-bit charset for western europe now (btw, the
other name is latin9, or latin0, as it's supposed to replace
iso8859-1, which is now what should be called a legacy encoding).

16 is quite new. It's supposed to do the same as iso-8859-15, but for
central europe countries. It had support for the euro sign, the
romanian language (t comma below, s comma below), but I've read
somewhere that it has lost support for two or three other central
europe countries... go figure...

> > - latin92mic/mic2latin9/latin102mic/mic2latin10 in conv.c
> > - the leading character value in pg_wchar.h
> >
> > I don't know anything about MULE except that it's some Emacs standard
> > (why they didn't go for Unicode is beyond my understanding, is
> > off-topic on this list, and had probably some good argument at the
> > time).
> 
> Probably this is because Unicode is not perfect at all. For example,
> the concept "encode everything in two-bytes" is obviously broken
> down, some charsets (for example JIS X 0213) are not supported at all,
> etc. etc...

Well, for the history iso-10646 was 32 bits from the beginning, and
Unicode didn't say that it was only 16 bits, though, to be fair, the
Unicode consortium said it didn't believe it would need more than 16
bits.

BTW, now, there is a statement that they wouldn't go above 0x10ffff,
which gives a bit more than 1 million characters... I think it should
be enough this time (but who knows !?).

Regarding the *main* issue with Unicode, which is support of japanese
kanji vs chinese (in the CJK unification), I must admit I don't know
the details, but arguments of both sides seem to be valid. I must
admit I would say "add the japanese version of the characters", since
it's not lack of space which is the problem now. But things like this
will get solved with time, and it really seems like Unicode will
achieve the so much needed charset unity it's been made for :)

> > Can someone point me to where I should look for that ? is it as
> > easy as iso-8859-2/3/4 support, or do I need to do something as
> > iso-8859-5 ?
> 
> Docs for MULE internal code come with XEmacs. For example, see:
> 
> ftp://ftp.xemacs.org/pub/xemacs/docs/letter/internals-letter.pdf.gz
> 
> http://www.lns.cornell.edu/public/COMP/info/xemacs/internals/internals_15.html#SEC83

Unfortunately, these explain the principles behind mule, not the way
to encode them from/to another character set :/

Patrice

-- 
Patrice Hédé
email: patrice hede à islande org
www  : http://www.islande.org/


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: extract(timezone_hour) funny business
Next
From: Patrice Hédé
Date:
Subject: Re: Encoding issues