Home > mailing lists

pgsql: Handle Unicode surrogate pairs correctly when processing JSON. - Mailing list pgsql-committers

From	Andrew Dunstan
Subject	pgsql: Handle Unicode surrogate pairs correctly when processing JSON.
Date	June 8, 2013 13:21:28
Msg-id	E1UlJ56-0004S5-G5@gemulon.postgresql.org Whole thread Raw
Responses	Re: pgsql: Handle Unicode surrogate pairs correctly when processing JSON.
List	pgsql-committers

Tree view

Handle Unicode surrogate pairs correctly when processing JSON.

In 9.2, Unicode escape sequences are not analysed at all other than
to make sure that they are in the form \uXXXX. But in 9.3 many of the
new operators and functions try to turn JSON text values into text in
the server encoding, and this includes de-escaping Unicode escape
sequences. This processing had not taken into account the possibility
that this might contain a surrogate pair to designate a character
outside the BMP. That is now handled correctly.

This also enforces correct use of surrogate pairs, something that is not
done by the type's input routines. This fact is noted in the docs.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/94e3311b97448324d67ba9a527854271373329d9

Modified Files
--------------
doc/src/sgml/func.sgml             |    9 +++++++
src/backend/utils/adt/json.c       |   52 ++++++++++++++++++++++++++++++++++++
src/test/regress/expected/json.out |   23 ++++++++++++++++
src/test/regress/sql/json.sql      |    8 ++++++
4 files changed, 92 insertions(+)

pgsql-committers by date:

From: Peter Eisentraut
Date: 08 June 2013, 02:04:21
Subject: pgsql: doc: Fix in markup

From: Andrew Dunstan
Date: 08 June 2013, 14:25:10
Subject: pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding

pgsql: Handle Unicode surrogate pairs correctly when processing JSON. - Mailing list pgsql-committers

Previous

Next