Re: JSON for PG 9.2 - Mailing list pgsql-hackers

From Joey Adams
Subject Re: JSON for PG 9.2
Date
Msg-id CAARyMpDS_4xcwWPH3XXcxBbOqEmGyc9YCkCXcH9q=pka1PQZYg@mail.gmail.com
Whole thread Raw
In response to Re: JSON for PG 9.2  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: JSON for PG 9.2  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
On Sat, Jan 14, 2012 at 3:06 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> Second, what should be do when the database encoding isn't UTF8? I'm
> inclined to emit a \unnnn escape for any non-ASCII character (assuming it
> has a unicode code point - are there any code points in the non-unicode
> encodings that don't have unicode equivalents?). The alternative would be to
> fail on non-ASCII characters, which might be ugly. Of course, anyone wanting
> to deal with JSON should be using UTF8 anyway, but we still have to deal
> with these things. What about SQL_ASCII? If there's a non-ASCII sequence
> there we really have no way of telling what it should be. There at least I
> think we should probably error out.

I don't think there is a satisfying solution to this problem.  Things
working against us:
* Some server encodings support characters that don't map to Unicode
characters (e.g. unused slots in Windows-1252).  Thus, converting to
UTF-8 and back is lossy in general.
* We want a normalized representation for comparison.  This will
involve a mixture of server and Unicode characters, unless the
encoding is UTF-8.
* We can't efficiently convert individual characters to and from
Unicode with the current API.
* What do we do about \u0000 ?  TEXT datums cannot contain NUL characters.

I'd say just ban Unicode escapes and non-ASCII characters unless the
server encoding is UTF-8, and ban all \u0000 escapes.  It's easy, and
whatever we support later will be a superset of this.

Strategies for handling this situation have been discussed in prior
emails.  This is where things got stuck last time.

- Joey


pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: WIP -- renaming implicit sequences
Next
From: Josh Kupershmidt
Date:
Subject: Re: Dry-run mode for pg_archivecleanup