Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON - Mailing list pgsql-hackers

From Joey Adams
Subject Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON
Date
Msg-id CAARyMpBe8OonqjcmNeEhJLZ6Kf-Ljy_mbEdHtw4K4b=qXtxZ9Q@mail.gmail.com
Whole thread Raw
In response to Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON
Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON
List pgsql-hackers
On Fri, Jul 22, 2011 at 7:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Hmm.  That's tricky.  I lean mildly toward throwing an error as being
> more consistent with the general PG philosophy.

I agree.  Besides, throwing an error on duplicate keys seems like the
most logical thing to do.  The most compelling reason not to, I think,
is that it would make the input function a little slower.

On Fri, Jul 22, 2011 at 8:26 PM, Florian Pflug <fgp@phlo.org> wrote:
>> * The XML type doesn't have this restriction (it just stores the
>> input text verbatim, and converts it to UTF-8 before doing anything
>> complicated with it).
>
> Yeah. But the price the XML type pays for that is the lack of an
> equality operator.

Interesting.  This leads to a couple more questions:
* Should the JSON data type (eventually) have an equality operator?* Should the JSON input function alphabetize object
membersby key? 

If we canonicalize strings and numbers and alphabetize object members,
then our equality function is just texteq.  The only stumbling block
is canonicalizing numbers.  Fortunately, JSON's definition of a
"number" is its decimal syntax, so the algorithm is child's play:
* Figure out the digits and exponent.* If the exponent is greater than 20 or less than 6 (arbitrary), use
exponential notation.

The problem is: 2.718282e-1000 won't equal 0 as may be expected.  I
doubt this matters much.

It would be nice to canonicalize JSON on input, and that's the way I'd
like to go, but two caveats are:
* Input (and other operations) would require more CPU time.  Instead
of being able to pass the data through a quick condense function, it'd
have to construct an AST (to sort object members) and re-encode the
JSON back into a string.* Users, for aesthetic reasons, might not want their JSON members rearranged.

If, in the future, we add the ability to manipulate large JSON trees
efficiently (e.g. by using an auxiliary table like TOAST does), we'll
probably want unique members, so enforcing them now may be prudent.

- Joey


pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Re: pgbench --unlogged-tables
Next
From: Josh Kupershmidt
Date:
Subject: Re: psql: display of object comments