Re: additional json functionality - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: additional json functionality
Date
Msg-id CAHyXU0zYyYN8WWsVJJXOok8iJDURmz6c1tvJ5y63QR24t26pVg@mail.gmail.com
Whole thread Raw
In response to Re: additional json functionality  (Hannu Krosing <hannu@2ndQuadrant.com>)
Responses Re: additional json functionality  ("David E. Wheeler" <david@justatheory.com>)
List pgsql-hackers
On Thu, Nov 14, 2013 at 1:54 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
> On 11/14/2013 08:17 PM, Merlin Moncure wrote:
>> On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler
>> <david@justatheory.com> wrote:
>>> On Nov 14, 2013, at 7:07 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>>
>>>> This is exactly what needs to be done, full stop (how about: hstore).
>>>> It really comes down to this: changing the serialization behaviors
>>>> that have been in production for 2 releases (three if you count the
>>>> extension) is bad enough, but making impossible some legal json
>>>> constructions which are currently possible is an unacceptable
>>>> compatibility break.  It's going to break applications I've currently
>>>> put into production with no clear workaround.  This is quite frankly
>>>> not ok and and I'm calling foul.  The RFC may claim that these
>>>> constructions are dubious but that's irrelevant.  It's up to the
>>>> parser to decide that and when serializing you are not in control of
>>>> the parser.
>>> The current JSON type preserves key order and duplicates. But is it documented that this is a feature, or something
tobe guaranteed?
 
>> It doesn't, but the row_to_json function has a very clear mechanism of
>> action.  And, 'not being documented' is not the standard for latitude
>> to make arbitrary changes to existing function behaviors.
> the whole hash*() function family was changed based on "not documented"
> premise, so we do have a precedent .
>>
>>> In my experience, no JSON parser guarantees key order or duplication.
>> I found one in about two seconds.  http://docs.python.org/2/library/json.html
>>
>> "object_pairs_hook, if specified will be called with the result of
>> every JSON object decoded with an ordered list of pairs. The return
>> value ofobject_pairs_hook will be used instead of the dict. This
>> feature can be used to implement custom decoders that rely on the
>> order that the key and value pairs are decoded (for example,
>> collections.OrderedDict() will remember the order of insertion). If
>> object_hook is also defined, the object_pairs_hooktakes priority."
>>
>> That makes the rest of your argument moot.  Plus, I quite clearly am
>> dealing with parsers that do.
> I am sure you could also devise an json encoding scheme
> where white space is significant ;)
>
> The question is, how much of it should json *type* support.
>
> As discussed in other thread, most of your requirements
> would be met by having json/row/row set-to-text serializer
> functions which output json-formatted "text".

No, that would not work putting aside the fact it would require
rewriting heaps of code.  What I do now inside the json wrapping
routines is create things like

{ "x": [   {dynamic object},   {dynamic object},   ... ], "y": ..., ...
}

The only way to do it is to build 'dynamic object' into json in
advance of the outer xxx_to_json call.  The 'dynamic object' is
created out of a json builder that takes a paired array -- basically a
variant of Andrew's 'json_build' upthread.  If the 'json serializer'
outputted text, the 'outer' to_json call would then re-escape the
object.  I can't use hstore for that purpose precisely because of the
transformations it does on the object.

Stepping back, I'm using json serialization as a kind of 'supercharged
crosstab'.  To any client that can parse json, json serialization
completely displaces crosstabbing -- it's superior in every way.  I
am, if you may, kind of leading research efforts in the area and I can
tell you with absolute certainty that breaking this behavior is a
mistake.

Forcing hstore-ish output mechanisms removes the ability to handle
certain important edge cases that work just fine today. If that
ability was taken away, it would be a very bitter pill for me to
swallow and would have certain ramifications for me professionally; I
went out on a pretty big limb and pushed pg/json aggressively (over
strenuous objection) in an analytics product which is now in the final
stages of beta testing.  I would hate to see the conclusion of the
case study be "Ultimately we had to migrate the code back to Hibernate
due to compatibility issues".

Here are the options on the table:
1) convert existing json type to binary flavor (notwithstanding objections)
2) maintain side by side types, one representing binary, one text.
unfortunately, i think the text one must get the name 'json' due to
unfortunate previous decision.
3) merge the behaviors into a single type and get the best of both
worlds (as suggested upthread).

I think we need to take a *very* hard look at #3 before exploring #1
or #2: Haven't through it through yet but it may be possible to handle
this in such a way that will be mostly transparent to the end user and
may have other benefits such as a faster path for serialization.

merlin



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: strncpy is not a safe version of strcpy
Next
From: Jaime Casanova
Date:
Subject: Re: Turning recovery.conf into GUCs