Re: pgsql: Handle Unicode surrogate pairs correctly when processing JSON. - Mailing list pgsql-committers

From Andrew Dunstan
Subject Re: pgsql: Handle Unicode surrogate pairs correctly when processing JSON.
Date
Msg-id 51C888B4.3070806@dunslane.net
Whole thread Raw
In response to Re: pgsql: Handle Unicode surrogate pairs correctly when processing JSON.  (Bruce Momjian <bruce@momjian.us>)
List pgsql-committers
On 06/24/2013 11:50 AM, Bruce Momjian wrote:
> On Sat, Jun  8, 2013 at 01:21:20PM +0000, Andrew Dunstan wrote:
>> Handle Unicode surrogate pairs correctly when processing JSON.
>>
>> In 9.2, Unicode escape sequences are not analysed at all other than
>> to make sure that they are in the form \uXXXX. But in 9.3 many of the
>> new operators and functions try to turn JSON text values into text in
>> the server encoding, and this includes de-escaping Unicode escape
>> sequences. This processing had not taken into account the possibility
>> that this might contain a surrogate pair to designate a character
>> outside the BMP. That is now handled correctly.
>>
>> This also enforces correct use of surrogate pairs, something that is not
>> done by the type's input routines. This fact is noted in the docs.
>>
>> Branch
>> ------
>> master
>>
>> Details
>> -------
>> http://git.postgresql.org/pg/commitdiff/94e3311b97448324d67ba9a527854271373329d9
>>
>> Modified Files
>> --------------
>> doc/src/sgml/func.sgml             |    9 +++++++
>> src/backend/utils/adt/json.c       |   52 ++++++++++++++++++++++++++++++++++++
>> src/test/regress/expected/json.out |   23 ++++++++++++++++
>> src/test/regress/sql/json.sql      |    8 ++++++
>> 4 files changed, 92 insertions(+)
> Does this affect any data already stored in PG 9.3 beta?  Is it
> something that should require a catalog bump?
>

No and no. All it means is that where we previously extracted data
encoded with surrogate pairs incorrectly, now we do it correctly. Only
the processing functions enforce this - for legacy reasons the input
routines don't enforce correct use of surrogate pairs - or indeed any
unicode escapes, as long as they are in the form \uxxxx

cheers

andrew


pgsql-committers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: pgsql: Handle Unicode surrogate pairs correctly when processing JSON.
Next
From: Peter Eisentraut
Date:
Subject: pgsql: Translation updates