Re: additional json functionality - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: additional json functionality
Date
Msg-id 52868605.8020900@dunslane.net
Whole thread Raw
In response to Re: additional json functionality  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: additional json functionality  (Merlin Moncure <mmoncure@gmail.com>)
Re: additional json functionality  ("David E. Wheeler" <david@justatheory.com>)
List pgsql-hackers
On 11/15/2013 03:25 PM, Merlin Moncure wrote:
> On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler <david@justatheory.com> wrote:
>> On Nov 15, 2013, at 6:35 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>
>>> Here are the options on the table:
>>> 1) convert existing json type to binary flavor (notwithstanding objections)
>>> 2) maintain side by side types, one representing binary, one text.
>>> unfortunately, i think the text one must get the name 'json' due to
>>> unfortunate previous decision.
>>> 3) merge the behaviors into a single type and get the best of both
>>> worlds (as suggested upthread).
>>>
>>> I think we need to take a *very* hard look at #3 before exploring #1
>>> or #2: Haven't through it through yet but it may be possible to handle
>>> this in such a way that will be mostly transparent to the end user and
>>> may have other benefits such as a faster path for serialization.
>> If it’s possible to preserve order and still get the advantages of binary representation --- which are substantial
(seehttp://theory.so/pg/2013/10/23/testing-nested-hstore/ and http://theory.so/pg/2013/10/25/indexing-nested-hstore/
fora couple of examples) --- without undue maintenance overhead, then great. 
>>
>> I am completely opposed to duplicate key preservation in JSON, though. It has caused us a fair number of headaches
at$work. 
> Kinda yes, kinda no.  Here's a rough sketch of what I'm thinking:
>
> *) 'json' type internally has a binary as well a text representation.
> The text representation is basically the current type behavior
> (duduplicated unordered).  The binary representation is the hstore-ish
> variant.  The text mode is discarded when it's deemed no longer
> appropriate to be needed, and, once gone, can never be rebuilt as it
> was.
>
> *) only the binary internal representation ever gets stored to disk
> (or anything else).
>
> *) the text mode is preferred for output if it is there.  otherwise, a
> deduplicated, reordered text representation is generated
>
> *) When literal text is casted to json, the binary structure is built
> up and kept alongside binary mode.   So, if you went: 'select '{"a":
> 1,     "a": 2}'::json', you'd get the same thing back.  (This is how
> it works now.).  but, if you went: 'insert into foo select '{"a": 1,
>    "a": 2}'::json returning *', you'd get {"a": 2} back essentially
> (although technically that would be a kind of race).
>
> *) When the json is stored to table, the text representation gets
> immediately discarded on the basis that it's no longer the true
> representation of the data.
>
> *) Ditto when making any equality operation (not as sure on this point).
>
> *) Ditto when doing any operation that mutates the structure in any
> way. the text representation is immutable except during serialization
> and if it gets invalidated it gets destroyed.
>
> *) New API function: json_simplify(); or some such.  It reorders and
> dedups from user's point of view (but really just kills off the text
> representation)
>
> *) once the text mode is gone, you get basically the proposed 'hstore' behavior.
>
> *) serialization functions are generally used in contexts that do not
> store anything but get output as query data.  They create *only* the
> text mode.  However, if the resultant json is stored anywhere, the
> text mode is destroyed and replaced with binary variant.  This is both
> a concession to the current behavior and an optimization of
> 'serialization-in-query' for which I think the binary mode is pessimal
> performance wise.  so, xxx_to_json serialization functions work
> exactly as they do now which fixes my problem essentially.
>
> *) if you are unhappy with duplicates in the above, just get use to
> calling  json_simpify() on the serialized json (or deal with in on the
> client side).
>
> This is all pretty glossy, but maybe there is a way forward...
>


It's making my head hurt, to be honest, and it sounds like a recipe for
years and years of inconsistencies and bugs.

I don't want to have two types, but I think I'd probably rather have two
clean types than this. I can't imagine it being remotely acceptable to
have behaviour depend in whether or not something was ever stored, which
is what this looks like.

cheers

andrew




>
>




pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: additional json functionality
Next
From: Thom Brown
Date:
Subject: postgres_fdw, remote triggers and schemas