Re: additional json functionality - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: additional json functionality
Date
Msg-id CAHyXU0xa=1jJ3+4LJBRpyoFfUsz=7mQxQg8mdteAacbnNxJahw@mail.gmail.com
Whole thread Raw
In response to Re: additional json functionality  ("David E. Wheeler" <david@justatheory.com>)
Responses Re: additional json functionality  (Andrew Dunstan <andrew@dunslane.net>)
Re: additional json functionality  (Hannu Krosing <hannu@2ndQuadrant.com>)
List pgsql-hackers
On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler <david@justatheory.com> wrote:
> On Nov 15, 2013, at 6:35 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>
>> Here are the options on the table:
>> 1) convert existing json type to binary flavor (notwithstanding objections)
>> 2) maintain side by side types, one representing binary, one text.
>> unfortunately, i think the text one must get the name 'json' due to
>> unfortunate previous decision.
>> 3) merge the behaviors into a single type and get the best of both
>> worlds (as suggested upthread).
>>
>> I think we need to take a *very* hard look at #3 before exploring #1
>> or #2: Haven't through it through yet but it may be possible to handle
>> this in such a way that will be mostly transparent to the end user and
>> may have other benefits such as a faster path for serialization.
>
> If it’s possible to preserve order and still get the advantages of binary representation --- which are substantial
(seehttp://theory.so/pg/2013/10/23/testing-nested-hstore/ and http://theory.so/pg/2013/10/25/indexing-nested-hstore/
fora couple of examples) --- without undue maintenance overhead, then great. 
>
> I am completely opposed to duplicate key preservation in JSON, though. It has caused us a fair number of headaches at
$work.

Kinda yes, kinda no.  Here's a rough sketch of what I'm thinking:

*) 'json' type internally has a binary as well a text representation.
The text representation is basically the current type behavior
(duduplicated unordered).  The binary representation is the hstore-ish
variant.  The text mode is discarded when it's deemed no longer
appropriate to be needed, and, once gone, can never be rebuilt as it
was.

*) only the binary internal representation ever gets stored to disk
(or anything else).

*) the text mode is preferred for output if it is there.  otherwise, a
deduplicated, reordered text representation is generated

*) When literal text is casted to json, the binary structure is built
up and kept alongside binary mode.   So, if you went: 'select '{"a":
1,     "a": 2}'::json', you'd get the same thing back.  (This is how
it works now.).  but, if you went: 'insert into foo select '{"a": 1, "a": 2}'::json returning *', you'd get {"a": 2}
backessentially 
(although technically that would be a kind of race).

*) When the json is stored to table, the text representation gets
immediately discarded on the basis that it's no longer the true
representation of the data.

*) Ditto when making any equality operation (not as sure on this point).

*) Ditto when doing any operation that mutates the structure in any
way. the text representation is immutable except during serialization
and if it gets invalidated it gets destroyed.

*) New API function: json_simplify(); or some such.  It reorders and
dedups from user's point of view (but really just kills off the text
representation)

*) once the text mode is gone, you get basically the proposed 'hstore' behavior.

*) serialization functions are generally used in contexts that do not
store anything but get output as query data.  They create *only* the
text mode.  However, if the resultant json is stored anywhere, the
text mode is destroyed and replaced with binary variant.  This is both
a concession to the current behavior and an optimization of
'serialization-in-query' for which I think the binary mode is pessimal
performance wise.  so, xxx_to_json serialization functions work
exactly as they do now which fixes my problem essentially.

*) if you are unhappy with duplicates in the above, just get use to
calling  json_simpify() on the serialized json (or deal with in on the
client side).

This is all pretty glossy, but maybe there is a way forward...

merlin



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Errors on missing pg_subtrans/ files with 9.3
Next
From: Andrew Dunstan
Date:
Subject: Re: additional json functionality