Re: json/jsonb inconsistence - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: json/jsonb inconsistence
Date
Msg-id 53872420.2000207@dunslane.net
Whole thread Raw
In response to json/jsonb inconsistence  (Teodor Sigaev <teodor@sigaev.ru>)
Responses Re: json/jsonb inconsistence
List pgsql-hackers
On 05/29/2014 07:55 AM, Teodor Sigaev wrote:
> # select  '"\uaBcD"'::json;
>    json
> ----------
>  "\uaBcD"
>
> but
>
> # select  '"\uaBcD"'::jsonb;
> ERROR:  invalid input syntax for type json
> LINE 1: select  '"\uaBcD"'::jsonb;
>                 ^
> DETAIL:  Unicode escape values cannot be used for code point values 
> above 007F when the server encoding is not UTF8.
> CONTEXT:  JSON data, line 1: ...
>
> and
>
> # select  '"\uaBcD"'::json -> 0;
> ERROR:  invalid input syntax for type json
> DETAIL:  Unicode escape values cannot be used for code point values 
> above 007F when the server encoding is not UTF8.
> CONTEXT:  JSON data, line 1: ...
> Time: 0,696 ms
>
> More than, json looks strange:
>
> # select  '["\uaBcD"]'::json;
>     json
> ------------
>  ["\uaBcD"]
>
> but
>
> # select  '["\uaBcD"]'::json->0;
> ERROR:  invalid input syntax for type json
> DETAIL:  Unicode escape values cannot be used for code point values 
> above 007F when the server encoding is not UTF8.
> CONTEXT:  JSON data, line 1: [...
>
> Looks like random parse rules.
>


It is documented that for json we don't check the validity of unicode 
escapes until we try to use them. That's because the original json 
parser didn't check them, and if we started doing so now users who 
pg_upgraded would find themselves with invalid data in the database. The 
rules for jsonb are more strict, because we actually resolve the unicode 
escapes when constructing the jsonb object. There is nothing at all 
random about it, although I agree it's slightly inconsistent. It's been 
the subject of some discussion on -hackers previously, IIRC. I actually 
referred to this difference in my talk at pgCon last Friday.

Frankly, if you want to use json/jsonb, you are going to be best served 
by using a UTF8-encoded database, or not using non-ascii unicode escapes 
in json at all.

cheers

andrew



pgsql-hackers by date:

Previous
From: Rahila Syed
Date:
Subject: Re: Compression of full-page-writes
Next
From: Greg Stark
Date:
Subject: backup_label revisited