Re: ERROR: unsupported Unicode escape sequence - in JSON-type column - Mailing list pgsql-general

From Tom Lane
Subject Re: ERROR: unsupported Unicode escape sequence - in JSON-type column
Date
Msg-id 368156.1677514339@sss.pgh.pa.us
Whole thread Raw
In response to Re: ERROR: unsupported Unicode escape sequence - in JSON-type column  (Erik Wienhold <ewie@ewie.name>)
List pgsql-general
Erik Wienhold <ewie@ewie.name> writes:
>> On 27/02/2023 13:13 CET Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>> I'd be curious to know how the customer managed to do that.
>> Perhaps there is a loophole in PostgreSQL that needs to be fixed.

> Another reason to prefer jsonb over json to reject such inputs right away.
> The documentation states that json does not validate inputs in constrast to
> jsonb.

It's not that it doesn't validate, it's that the validation rules are
different.  Per the manual [1]:

    RFC 7159 permits JSON strings to contain Unicode escape sequences
    denoted by \uXXXX. In the input function for the json type, Unicode
    escapes are allowed regardless of the database encoding, and are
    checked only for syntactic correctness (that is, that four hex digits
    follow \u). However, the input function for jsonb is stricter: it
    disallows Unicode escapes for characters that cannot be represented in
    the database encoding. The jsonb type also rejects \u0000 (because
    that cannot be represented in PostgreSQL's text type), and it insists
    that any use of Unicode surrogate pairs to designate characters
    outside the Unicode Basic Multilingual Plane be correct.

You can certainly quibble with our decisions here, but I think they
are reasonably consistent.  json is for data that you'd like a syntax
check on (else you might as well store it as "text"), but no more than
a syntax check, because you're going to do the actual JSON processing
elsewhere and you don't want Postgres opining on what semi-standard
JSON constructs mean.  If you're actually going to process the data
inside the database, jsonb is a better choice.  The extra restrictions
in jsonb are to ensure that a string value represented in JSON can be
extracted into a valid string of our text datatype.

Storing data in json and then casting to jsonb on-the-fly seems like
about the worst possible combination of choices.

            regards, tom lane

[1] https://www.postgresql.org/docs/current/datatype-json.html



pgsql-general by date:

Previous
From: Dávid Suchan
Date:
Subject: pg_upgradecluster transfering only a portion of the data
Next
From: Adrian Klaver
Date:
Subject: Re: pg_upgradecluster transfering only a portion of the data