Re: jsonb, unicode escapes and escaped backslashes - Mailing list pgsql-hackers

From Tom Lane
Subject Re: jsonb, unicode escapes and escaped backslashes
Date
Msg-id 3770.1422392182@sss.pgh.pa.us
Whole thread Raw
In response to Re: jsonb, unicode escapes and escaped backslashes  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: jsonb, unicode escapes and escaped backslashes
Re: jsonb, unicode escapes and escaped backslashes
List pgsql-hackers
Andrew Dunstan <andrew@dunslane.net> writes:
> On 01/27/2015 02:28 PM, Tom Lane wrote:
>> Well, we can either fix it now or suffer with a broken representation
>> forever.  I'm not wedded to the exact solution I described, but I think
>> we'll regret it if we don't change the representation.
>> 
>> The only other plausible answer seems to be to flat out reject \u0000.
>> But I assume nobody likes that.

> I don't think we can be in the business of rejecting valid JSON.

Actually, after studying the code a bit, I wonder if we wouldn't be best
off to do exactly that, at least for 9.4.x.  At minimum we're talking
about an API change for JsonSemAction functions (which currently get the
already-de-escaped string as a C string; not gonna work for embedded
nulls).  I'm not sure if there are already third-party extensions using
that API, but it seems possible, in which case changing it in a minor
release wouldn't be nice.  Even ignoring that risk, making sure
we'd fixed everything seems like more than a day's work, which is as
much as I for one could spare before 9.4.1.

Also, while the idea of throwing error only when a \0 needs to be
converted to text seems logically clean, it looks like that might pretty
much cripple the usability of such values anyhow, because we convert to
text at the drop of a hat.  So some investigation and probably additional
work would be needed to ensure you could do at least basic things with
such values.  (A function for direct conversion to bytea might be useful
too.)

I think the "it would mean rejecting valid JSON" argument is utter
hogwash.  We already reject, eg, "\u00A0" if you're not using a UTF8
encoding.  And we reject "1e10000", not because that's invalid JSON
but because of an implementation restriction of our underlying numeric
type.  I don't see any moral superiority of that over rejecting "\u0000"
because of an implementation restriction of our underlying text type.

So at this point I propose that we reject \u0000 when de-escaping JSON.
Anybody who's seriously unhappy with that can propose a patch to fix it
properly in 9.5 or later.

We probably need to rethink the re-escaping behavior as well; I'm not
sure if your latest patch is the right answer for that.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: proposal: row_to_array function
Next
From: Robert Haas
Date:
Subject: Re: Parallel Seq Scan