Re: jsonb, unicode escapes and escaped backslashes - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: jsonb, unicode escapes and escaped backslashes
Date
Msg-id CAM3SWZR7uq+ogmPm1ofGTkCWFRHX1BREAskZdOmSRp1E6N04xA@mail.gmail.com
Whole thread Raw
In response to Re: jsonb, unicode escapes and escaped backslashes  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Thu, Jan 29, 2015 at 11:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The point of JSONB is that we take a position on certain aspects like
>> this. We're bridging a pointedly loosey goosey interchange format,
>> JSON, with native PostgreSQL types. For example, we take a firm
>> position on encoding. The JSON type is a bit more permissive, to about
>> the extent that that's possible. The whole point is that we're
>> interpreting JSON data in a way that's consistent with *Postgres*
>> conventions. You'd have to interpret the data according to *some*
>> convention in order to do something non-trivial with it in any case,
>> and users usually want that.
>
> I quite agree with you, actually, in terms of that perspective.

Sure, but I wasn't sure that that was evident to others.

To emphasize: I think it's appropriate that the JSON spec takes
somewhat of a back seat approach to things like encoding and the
precision of numbers. I also think it's appropriate that JSONB does
not, up to and including where JSONB forbids things that the JSON spec
supposes could be useful. We haven't failed users by (say) not
accepting NULs, even though the spec suggests that that might be
useful - we have provided them with a reasonable, concrete
interpretation of that JSON data, with lots of useful operators, that
they may take or leave. It really isn't historical that we have both a
JSON and JSONB type. For other examples of this, see every "document
database" in existence.

Depart from this perspective, as an interchange standard author, and
you end up with something like XML, which while easy to reason about
isn't all that useful, or BSON, the binary interchange format, which
is an oxymoron.

> But my point remains: "\u0000" is not invalid JSON syntax, and neither is
> "\u1234".  If we choose to throw an error because we can't interpret or
> process that according to our conventions, fine, but we should call it
> something other than "invalid syntax".
>
> ERRCODE_UNTRANSLATABLE_CHARACTER or ERRCODE_CHARACTER_NOT_IN_REPERTOIRE
> seem more apropos from here.

I see. I'd go with ERRCODE_UNTRANSLATABLE_CHARACTER, then.
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: jsonb, unicode escapes and escaped backslashes
Next
From: Dean Rasheed
Date:
Subject: Re: Possible typo in create_policy.sgml