Re: JSON for PG 9.2 - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: JSON for PG 9.2
Date
Msg-id 4F121A40.6070508@dunslane.net
Whole thread Raw
In response to Re: JSON for PG 9.2  (Joey Adams <joeyadams3.14159@gmail.com>)
List pgsql-hackers

On 01/14/2012 06:11 PM, Joey Adams wrote:
> On Sat, Jan 14, 2012 at 3:06 PM, Andrew Dunstan<andrew@dunslane.net>  wrote:
>> Second, what should be do when the database encoding isn't UTF8? I'm
>> inclined to emit a \unnnn escape for any non-ASCII character (assuming it
>> has a unicode code point - are there any code points in the non-unicode
>> encodings that don't have unicode equivalents?). The alternative would be to
>> fail on non-ASCII characters, which might be ugly. Of course, anyone wanting
>> to deal with JSON should be using UTF8 anyway, but we still have to deal
>> with these things. What about SQL_ASCII? If there's a non-ASCII sequence
>> there we really have no way of telling what it should be. There at least I
>> think we should probably error out.
> I don't think there is a satisfying solution to this problem.  Things
> working against us:
>
>   * Some server encodings support characters that don't map to Unicode
> characters (e.g. unused slots in Windows-1252).  Thus, converting to
> UTF-8 and back is lossy in general.
>
>   * We want a normalized representation for comparison.  This will
> involve a mixture of server and Unicode characters, unless the
> encoding is UTF-8.
>
>   * We can't efficiently convert individual characters to and from
> Unicode with the current API.
>
>   * What do we do about \u0000 ?  TEXT datums cannot contain NUL characters.
>
> I'd say just ban Unicode escapes and non-ASCII characters unless the
> server encoding is UTF-8, and ban all \u0000 escapes.  It's easy, and
> whatever we support later will be a superset of this.
>
> Strategies for handling this situation have been discussed in prior
> emails.  This is where things got stuck last time.
>


Well, from where I'm coming from, nuls are not a problem. But 
escape_json() is currently totally encoding-unaware. It produces \unnnn 
escapes for low ascii characters, and just passes through characters 
with the high bit set. That's possibly OK for EXPLAIN output - we really 
don't want don't want EXPLAIN failing. But maybe we should ban JSON 
output for EXPLAIN if the encoding isn't UTF8.

Another question in my mind is what to do when the client encoding isn't 
UTF8.

None of these is an insurmountable problem, ISTM - we just need to make 
some decisions.

cheers

andrew


pgsql-hackers by date:

Previous
From: Josh Kupershmidt
Date:
Subject: Re: Dry-run mode for pg_archivecleanup
Next
From: Mike Lewis
Date:
Subject: Re: JSON for PG 9.2