Home > mailing lists

Re: JSON and unicode surrogate pairs - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: JSON and unicode surrogate pairs
Date	June 6, 2013 16:53:49
Msg-id	CA+TgmoapNgKpPiwVyR=wxCj=1m9RqL3311gA6fibbXijMv=rtg@mail.gmail.com Whole thread
In response to	JSON and unicode surrogate pairs (Andrew Dunstan <andrew@dunslane.net>)
Responses	Re: JSON and unicode surrogate pairs
List	pgsql-hackers

Tree view

On Wed, Jun 5, 2013 at 10:46 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
> In 9.2, the JSON parser didn't check the validity of the use of unicode
> escapes other than that it required 4 hex digits to follow '\u'. In 9.3,
> that is still the case. However, the JSON accessor functions and operators
> also try to turn JSON strings into text in the server encoding, and this
> includes de-escaping \u sequences. This works fine except when there is a
> pair of sequences representing a UTF-16 type surrogate pair, something that
> is explicitly permitted in the JSON spec.
>
> The attached patch is an attempt to remedy that, and a surrogate pair is
> turned into the correct code point before converting it to whatever the
> server encoding is.
>
> Note that this would mean we can still put JSON with incorrect use of
> surrogates into the database, as now (9.2 and later), and they will cause
> almost all the accessor functions to raise an error, as now (9.3). All this
> does is allow JSON that uses surrogates correctly not to fail when applying
> the accessor functions and operators. That's a possible violation of POLA,
> and at least worth of a note in the docs, but I'm not sure what else we can
> do now - adding this check to the input lexer would possibly cause restores
> to fail, which users might not thank us for.

I think the approach you've proposed here is a good one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Robert Haas
Date: 06 June 2013, 16:51:42
Subject: Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

From: Josh Berkus
Date: 06 June 2013, 17:00:25
Subject: Re: Redesigning checkpoint_segments

Re: JSON and unicode surrogate pairs - Mailing list pgsql-hackers

Previous

Next