Re: Unicode string literals versus the world - Mailing list pgsql-hackers

From Sam Mason
Subject Re: Unicode string literals versus the world
Date
Msg-id 20090416105113.GK12225@frubble.xen.chris-lamb.co.uk
Whole thread Raw
In response to Re: Unicode string literals versus the world  (Marko Kreen <markokr@gmail.com>)
Responses Re: Unicode string literals versus the world  (Marko Kreen <markokr@gmail.com>)
List pgsql-hackers
On Wed, Apr 15, 2009 at 11:19:42PM +0300, Marko Kreen wrote:
> On 4/15/09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Given Martijn's complaint about more-than-16-bit code points, I think
> >  the \u proposal is not mature enough to go into 8.4.  We can think
> >  about some version of that later, if there's enough interest.
> 
> I think it would be good idea. Basically we should pick one from
> couple of pre-existing sane schemes.  Here is quick summary
> of Python, Perl and Java:
> 
> Python [1]:
> 
>   \uXXXX         - 16-bit codepoint
>   \UXXXXXXXX     - 32-bit codepoint
>   \N{char-name}  - Characted by name

Microsoft have also gone this way in C#, named code points are not
supported however.

> Perl [2]:
> 
>   \x{XXXX..}     - {} contains hexadecimal codepoint
>   \N{char-name}  - Unicode char name

Looks OK, but the 'x' seems somewhat redundant.  Why not just:
 \{xxxx}

This would be following the BitC[2] project, especially if it was more
like:
 \{U+xxxx}

e.g.
 \{U+03BB}

would be the lowercase lambda character.  Added appeal is in the fact
that this (i.e. U+03BB) is how the Unicode consortium spells code
points.

> Java [3]:
> 
>   \uXXXX         - 16-bit codepoint

AFAIK, Java isn't the best reference to choose; it assumed from an early
point in its design that Unicode characters were at most 16bits and
hence had to switch its internal representation to UTF-16.  I don't
program much Java these days to know how it's all worked out, but it
would be interesting to hear from people who regularly have to deal with
characters outside the BMP (i.e. code points greater than 65535).

--  Sam  http://samason.me.uk/
[1] http://msdn.microsoft.com/en-us/library/aa664669(VS.71).aspx[2]
http://www.bitc-lang.org/docs/bitc/spec.html#stringlit


pgsql-hackers by date:

Previous
From: mito
Date:
Subject: Postgres SQL specification (tests)
Next
From: Robert Haas
Date:
Subject: Re: [GENERAL] Performance of full outer join in 8.3