Re: Java's Unicode Notation - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: Java's Unicode Notation
Date
Msg-id 20011111190422Y.t-ishii@sra.co.jp
Whole thread Raw
In response to Re: Beta going well  ("Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at>)
List pgsql-hackers
From: Jean-Michel POURE <jm.poure@freesurf.fr>
Subject: Java's Unicode Notation 
Date: Thu, 08 Nov 2001 14:12:04 +0100
Message-ID: <4.2.0.58.20011108141018.00a59dc0@pop.freesurf.fr>

> Dear Tatsuo,
> 
> Could it be possible to use the Java Unicode Notation to define UTF-8 
> strings in PostgreSQL 7.2.

No. It's too late. We are in the beta freeze stage.

> Information can be found on http://czyborra.com/utf/
> 
> Do you think it is hard to implement?
> 
> Best regards,
> Jean-Michel POURE
> 
> ************************************************
> Java's Unicode Notation
> There are some less compact but more readable ASCII transformations the 
> most important of which is the Java Unicode Notation as allowed in Java 
> source code and processed by Java's native2ascii converter:
> putwchar(c)
> {
> if (c >= 0x10000) {
> printf ("\\u%04x\\u%04x" , 0xD7C0 + (c >> 10), 0xDC00 | c & 0x3FF);
> }
> else if (c >= 0x100) printf ("\\u%04x", c);
> else putchar (c);
> }
> The advantage of the \u20ac notation is that it is very easy to type it in 
> on any old ASCII keyboard and easy to look up the intended character if you 
> happen to have a copy of the Unicode book or the 
> {unidata2,names2,unihan}.txt files from the Unicode FTP site or CD-ROM or 
> know what U+20AC is the �.
> What's not so nice about the \u20ac notation is that the small letters are 
> quite unusual for Unicode characters, the backslashes have to be quoted for 
> many Unix tools, the four hexdigits without a terminator may appear merged 
> with the following word as in \u00a333 for ��33, it is unclear when and how 
> you have to escape the backslash character itself, 6 bytes for one 
> character may be considered wasteful, and there is no way to clearly 
> present the characters beyond \uffff without \ud800\udc00 surrogates, and 
> last but not least the plain hexnumbers may not be very helpful.
> JAVA is one of the target and source encodings of yudit and its uniconv 
> converter.
> 


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: compiling libpq++ on Solaris with Sun SPRO6U2 (fixed
Next
From: Peter Eisentraut
Date:
Subject: Re: Error on stock postgresql-tcl-7.1.3-2.i386.rpm included