Home > mailing lists

Re: Unicode string literals versus the world - Mailing list pgsql-hackers

From	Sam Mason
Subject	Re: Unicode string literals versus the world
Date	April 16, 2009 12:24:47
Msg-id	20090416152442.GN12225@frubble.xen.chris-lamb.co.uk Whole thread Raw
In response to	Re: Unicode string literals versus the world (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Unicode string literals versus the world
List	pgsql-hackers

Tree view

On Thu, Apr 16, 2009 at 10:54:16AM -0400, Tom Lane wrote:
> Sam Mason <sam@samason.me.uk> writes:
> > I'd never heard of UTF-16 surrogate pairs before this discussion and
> > hence didn't realise that it's valid to have a surrogate pair in place
> > of a single code point.  The docs say that <D800 DF02> corresponds to
> > U+10302, Python would appear to follow my intuitions in that:
> 
> >   ord(u'\uD800\uDF02')
> 
> > results in an error instead of giving back 66306, as I'd expect.  Is
> > this a bug in Python, my understanding, or something else?
> 
> I might be wrong, but I think surrogate pairs are expressly forbidden in
> all representations other than UTF16/UCS2.  We definitely forbid them
> when validating UTF-8 strings --- that's per an RFC recommendation.
> It sounds like Python is doing the same.

OK, that's good.  I thought I was missing something.  A minor point is
that in UCS2 each 16bit value is exactly one character and characters
outside the BMP aren't supported, hence the need for UTF-16.

I've failed to keep up with the discussion so I'm not sure where this
conversation has got to!  Is the consensus for 8.4 to enable SQL2003
style U&lit escaped literals if and only if standard_conforming_strings
is set?  This seems easiest for client code as it can use this
exclusively for knowing what to do with backslashes.

--  Sam  http://samason.me.uk/

pgsql-hackers by date:

From: David Fetter
Date: 16 April 2009, 12:21:44
Subject: Re: [GENERAL] Performance of full outer join in 8.3

From: Marko Kreen
Date: 16 April 2009, 12:34:20
Subject: Re: Unicode string literals versus the world

Re: Unicode string literals versus the world - Mailing list pgsql-hackers

Previous

Next