jsonb, unicode escapes and escaped backslashes - Mailing list pgsql-hackers

From Andrew Dunstan
Subject jsonb, unicode escapes and escaped backslashes
Date
Msg-id 54C03B86.80604@dunslane.net
Whole thread Raw
Responses Re: jsonb, unicode escapes and escaped backslashes
List pgsql-hackers
The following case has just been brought to my attention (look at the 
differing number of backslashes):
   andrew=# select jsonb '"\\u0000"';      jsonb   ----------     "\u0000"   (1 row)
   andrew=# select jsonb '"\u0000"';      jsonb   ----------     "\u0000"   (1 row)
   andrew=# select json '"\u0000"';       json   ----------     "\u0000"   (1 row)
   andrew=# select json '"\\u0000"';       json   -----------     "\\u0000"   (1 row)

The problem is that jsonb uses the parsed, unescaped value of the 
string, while json does not. when the string parser sees the input with 
the 2 backslashes, it outputs a single backslash, and then it encounters 
the remaining chareacters and emits them as is, resulting in a token of 
'\u0000'. When it encounters the input with one backslash, it recognizes 
a unicode escape, and because it's for u+0000 emits '\u0000'. All other 
unicode escapes are resolved, so the only abiguity on input concerns 
this case.

Things get worse, though. On output, '\uabcd' for any four hex digits is 
recognized as a unicode escape, and thus the backslash is not escaped, 
so that we get:
   andrew=# select jsonb '"\\uabcd"';      jsonb   ----------     "\uabcd"   (1 row)


We could probably fix this fairly easily for non- U+0000 cases by having 
jsonb_to_cstring use a different escape_json routine.

But it's a mess, sadly, and I'm not sure what a good fix for the U+0000 
case would look like. Maybe we should detect such input and emit a 
warning of ambiguity? It's likely to be rare enough, but clearly not as 
rare as we'd like, since this is a report from the field.

cheers

andrew



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: pgaudit - an auditing extension for PostgreSQL
Next
From: Jim Nasby
Date:
Subject: Re: pgaudit - an auditing extension for PostgreSQL