Robert Haas <robertmhaas@gmail.com> writes:
> I understand Andrew to be saying that if you take a 6-character string
> and convert it to a JSON string and then back to text, you will
> *usually* get back the same 6 characters you started with ... unless
> the first character was \, the second u, and the remainder hexadecimal
> digits. Then you'll get back a one-character string or an error
> instead. It's not hard to imagine that leading to surprising
> behavior, or even security vulnerabilities in applications that aren't
> expecting such a translation to happen under them.
That *was* the case, with the now-reverted patch that changed the escaping
rules. It's not anymore:
regression=# select to_json('\u1234'::text); to_json
-----------"\\u1234"
(1 row)
When you convert that back to text, you'll get \u1234, no more and no
less. For example:
regression=# select array_to_json(array['\u1234'::text]);array_to_json
---------------["\\u1234"]
(1 row)
regression=# select array_to_json(array['\u1234'::text])->0;?column?
-----------"\\u1234"
(1 row)
regression=# select array_to_json(array['\u1234'::text])->>0;?column?
----------\u1234
(1 row)
Now, if you put in '"\u1234"'::jsonb and extract that string as text,
you get some Unicode character or other. But I'd say that a JSON user
who is surprised by that doesn't understand JSON, and definitely that they
hadn't read more than about one paragraph of our description of the JSON
types.
regards, tom lane