Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON - Mailing list pgsql-hackers

From Joey Adams
Subject Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON
Date
Msg-id CAARyMpDb2ZQZ8xZ1uwZHa_rf+FP+cFKK-xiUs1ELsFoE4Wea2A@mail.gmail.com
Whole thread Raw
In response to Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON  (Joey Adams <joeyadams3.14159@gmail.com>)
Responses Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON
Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON
List pgsql-hackers
I think I've decided to only allow escapes of non-ASCII characters
when the database encoding is UTF8.  For example, $$"\u2013"$$::json
will fail if the database encoding is WIN1252, even though WIN1252 can
encode U+2013 (EN DASH).  This may be somewhat draconian, given that:
* SQL_ASCII can otherwise handle "any" language according to the documentation.
* The XML type doesn't have this restriction (it just stores the
input text verbatim, and converts it to UTF-8 before doing anything
complicated with it).

However, it's simple to implement and understand.  The JSON data type
will not perform any automatic conversion between character encodings.Also, if we want to handle this any better in the
future,we won't
 
have to support legacy data containing a mixture of encodings.

In the future, we could create functions to compensate for the issues
people encounter; for example:
* json_escape_unicode(json [, replace bool]) returns text -- convert
non-ASCII characters to escapes.  Optionally, use \uFFFD for
unconvertible characters.* json_unescape_unicode(text [, replace text]) returns json -- like
json_in, but convert Unicode escapes to characters when possible.
Optionally, replace unconvertible characters with a given string.

I've been going back and forth on how to handle encodings in the JSON
type for a while, but suggestions and objections are still welcome.
However, I plan to proceed in this direction so progress can be made.

On another matter, should the JSON type guard against duplicate member
keys?  The JSON RFC says "The names within an object SHOULD be
unique," meaning JSON with duplicate members can be considered valid.
JavaScript interpreters (the ones I tried), PHP, and Python all have
the same behavior: discard the first member in favor of the second.
That is, {"key":1,"key":2} becomes {"key":2}.  The XML type throws an
error if a duplicate attribute is present (e.g. '<a href="b"
href="c"/>'::xml).

Thanks for the input,
- Joey


pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: storing TZ along timestamps
Next
From: Robert Haas
Date:
Subject: Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON