Re: JSON for PG 9.2 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: JSON for PG 9.2
Date
Msg-id CA+TgmoZksnjJTN4ejqPXOvZE5hWDEfj5AqTH=yzZYz4PhczL9Q@mail.gmail.com
Whole thread Raw
In response to Re: JSON for PG 9.2  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: JSON for PG 9.2  ("David E. Wheeler" <david@kineticode.com>)
Re: JSON for PG 9.2  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
On Fri, Jan 20, 2012 at 10:45 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
> XML's &#nnnn; escape mechanism is more or less the equivalent of JSON's
> \unnnn. But XML documents can be encoded in a variety of encodings,
> including non-unicode encodings such as Latin-1. However, no matter what the
> document encoding, &#nnnn; designates the character with Unicode code point
> nnnn, whether or not that is part of the document encoding's charset.

OK.

> Given that precedent, I'm wondering if we do need to enforce anything other
> than that it is a valid unicode code point.
>
> Equivalence comparison is going to be difficult anyway if you're not
> resolving all \unnnn escapes. Possibly we need some sort of canonicalization
> function to apply for comparison purposes. But we're not providing any
> comparison ops today anyway, so I don't think we need to make that decision
> now. As you say, there doesn't seem to be any defined canonical form - the
> spec is a bit light on in this respect.

Well, we clearly have to resolve all \uXXXX to do either comparison or
canonicalization.  The current patch does neither, but presumably we
want to leave the door open to such things.  If we're using UTF-8 and
comparing two strings, and we get to a position where one of them has
a character and the other has \uXXXX, it's pretty simple to do the
comparison: we just turn XXXX into a wchar_t and test for equality.
That should be trivial, unless I'm misunderstanding.  If, however,
we're not using UTF-8, we have to first turn \uXXXX into a Unicode
code point, then covert that to a character in the database encoding,
and then test for equality with the other character after that.  I'm
not sure whether that's possible in general, how to do it, or how
efficient it is.  Can you or anyone shed any light on that topic?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Inline Extension
Next
From: "David E. Wheeler"
Date:
Subject: Re: JSON for PG 9.2