On Fri, Jan 20, 2012 at 09:12:13AM -0800, David E. Wheeler wrote:
> On Jan 19, 2012, at 9:07 PM, Tom Lane wrote:
>
> > If his client encoding is UTF8, the value will be letter-perfect JSON
> > when it gets to him; and if his client encoding is not UTF8, then he's
> > already pretty much decided that he doesn't give a fig about the
> > Unicode-centricity of the JSON spec, no?
>
> Don’t entirely agree with this. Some folks are stuck with other encodings and
> cannot change them for one reason or another. That said, they can convert
> JSON from their required encoding into UTF-8 on the client side, so there is
> a workaround.
Perhaps in addition to trying to just 'do the right thing by default',
it makes sense to have a two canonicalization functions?
Say: json_utf8() and json_ascii().
They could give the same output no matter what encoding was set?
json_utf8 would give nice output where characters were canonicalized to
native utf8 characters and json_ascii() would output only non-control
ascii characters literally and escape everything else or something
like that?
Garick