Re: jsonb and nested hstore - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: jsonb and nested hstore
Date
Msg-id CAHyXU0wRnG6GxGdz5o_n+bxmAPF0x3q2kqYfYWo-Tb4K+W5-xg@mail.gmail.com
Whole thread Raw
In response to Re: jsonb and nested hstore  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: jsonb and nested hstore  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Mon, Feb 10, 2014 at 5:02 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-02-10 11:59:53 -0600, Merlin Moncure wrote:
>> On Mon, Feb 10, 2014 at 6:39 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > On 2014-02-10 07:27:59 -0500, Andrew Dunstan wrote:
>> >> On 02/10/2014 05:05 AM, Andres Freund wrote:
>> >> >I'd suggest making the format discernible from possible different future
>> >> >formats, to allow introducing a proper binary at some later time. Maybe
>> >> >just send a int8 first, containing the format.
>> >> >
>> >>
>> >> Teodor privately suggested something similar.  I was thinking of just
>> >> sending a version byte, which for now would be '\x01'. An int8 seems like
>> >> more future-proofing provision than we really need.
>> >
>> > Hm. Isn't that just about the same? I was thinking of the c type int8,
>> > not the 64bit type. It seems cleaner to do a pg_sendint(..., 1, 1) than
>> > to do it manually inside the string.
>>
>> -1.   Currently no other wire format types send version and it's not
>> clear why this one is special.  We've changed the wire format versions
>> before and it's upon the client to deal with those changes.  The
>> server version *is* the version basically.  If a broader solution
>> exists I think it should be addressed broadly.  Versioning one type
>> only IMNSHO is a complete hack.
>
> I don't find that very convincing. The entire reason jsonb exists is
> because the parsing overhead of text json is significant, so it stands
> to reason that soon somebody will try to work on a better wire protocol,
> even if the current code cannot be made ready for 9.4. And I don't think
> past instability of binary type's formats is a good reason for
> *needlessly* breaking stuff like binary COPYs.
> And it's not like one prefixed byte has any real-world relevant cost.

The point is, why does this one type get a version id?  Imagine a
hypothetical program that sent/received the binary format for jsonb.
All you have to to is manage the version flag appropriately, right?

Wrong.  You still need to have code that checks the server version and
see if it's supported (particularly for sending) and as there is *no
protocol negotiation of the formats at present it's all going to boil
down to if version = X do Y*.   How does the server know which
'versions' are ok to send? It doesn't.  Follow along with me here:
Suppose we don't introduce a version flag today and change the format
to some more exotic structure for 9.5.  How has the version flag made
things easier for the client?  It hasn't. The client goes "if version
= X do Y".

I guess you could argue that having a version flag could, say, allow
libpq clients to gracefully error out if, say, a old non-exotic-format
speaking libpq happens to connect to a newer sever -- assuming the
client actually bothered to check the flag.  That's zero help to the
client though -- regardless the compatibility isn't established and
that's zero help to other binary formats that we have=, and probably
will continue to-, change.  What about them?  Are we now, at the
upteenth hour of the final commit fest, suddenly deciding that binary
wire formats going to be compatible across versions?

The kinda low effort way to deal with binary format compatibility is
to simply document the existing formats and document format changes in
some convenient place.  The 'real' long term path to doing it IMO is
to abstract out a shared/client server type library with some protocol
negotiation features.  Then, at connection time, the client/server
agree on what's the optimal way to send things -- perhaps the client
can signal things like 'want compression for long datums'.

The only case for a version flag at the data point level is if the
server is sending version X at this tuple and version Y at that tuple.I don't think that's a makable case.  Some might
say,"what about a
 
compression bit based on compressibility/length?" and to that I'd
answer: why is that handling specific to the json type...are
text/bytea/arrays not worth that feature too?

merlin



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: jsonb and nested hstore
Next
From: Andres Freund
Date:
Subject: Re: jsonb and nested hstore