Re: jsonb and nested hstore - Mailing list pgsql-hackers
From | Merlin Moncure |
---|---|
Subject | Re: jsonb and nested hstore |
Date | |
Msg-id | CAHyXU0wRnG6GxGdz5o_n+bxmAPF0x3q2kqYfYWo-Tb4K+W5-xg@mail.gmail.com Whole thread Raw |
In response to | Re: jsonb and nested hstore (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: jsonb and nested hstore
|
List | pgsql-hackers |
On Mon, Feb 10, 2014 at 5:02 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-02-10 11:59:53 -0600, Merlin Moncure wrote: >> On Mon, Feb 10, 2014 at 6:39 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> > On 2014-02-10 07:27:59 -0500, Andrew Dunstan wrote: >> >> On 02/10/2014 05:05 AM, Andres Freund wrote: >> >> >I'd suggest making the format discernible from possible different future >> >> >formats, to allow introducing a proper binary at some later time. Maybe >> >> >just send a int8 first, containing the format. >> >> > >> >> >> >> Teodor privately suggested something similar. I was thinking of just >> >> sending a version byte, which for now would be '\x01'. An int8 seems like >> >> more future-proofing provision than we really need. >> > >> > Hm. Isn't that just about the same? I was thinking of the c type int8, >> > not the 64bit type. It seems cleaner to do a pg_sendint(..., 1, 1) than >> > to do it manually inside the string. >> >> -1. Currently no other wire format types send version and it's not >> clear why this one is special. We've changed the wire format versions >> before and it's upon the client to deal with those changes. The >> server version *is* the version basically. If a broader solution >> exists I think it should be addressed broadly. Versioning one type >> only IMNSHO is a complete hack. > > I don't find that very convincing. The entire reason jsonb exists is > because the parsing overhead of text json is significant, so it stands > to reason that soon somebody will try to work on a better wire protocol, > even if the current code cannot be made ready for 9.4. And I don't think > past instability of binary type's formats is a good reason for > *needlessly* breaking stuff like binary COPYs. > And it's not like one prefixed byte has any real-world relevant cost. The point is, why does this one type get a version id? Imagine a hypothetical program that sent/received the binary format for jsonb. All you have to to is manage the version flag appropriately, right? Wrong. You still need to have code that checks the server version and see if it's supported (particularly for sending) and as there is *no protocol negotiation of the formats at present it's all going to boil down to if version = X do Y*. How does the server know which 'versions' are ok to send? It doesn't. Follow along with me here: Suppose we don't introduce a version flag today and change the format to some more exotic structure for 9.5. How has the version flag made things easier for the client? It hasn't. The client goes "if version = X do Y". I guess you could argue that having a version flag could, say, allow libpq clients to gracefully error out if, say, a old non-exotic-format speaking libpq happens to connect to a newer sever -- assuming the client actually bothered to check the flag. That's zero help to the client though -- regardless the compatibility isn't established and that's zero help to other binary formats that we have=, and probably will continue to-, change. What about them? Are we now, at the upteenth hour of the final commit fest, suddenly deciding that binary wire formats going to be compatible across versions? The kinda low effort way to deal with binary format compatibility is to simply document the existing formats and document format changes in some convenient place. The 'real' long term path to doing it IMO is to abstract out a shared/client server type library with some protocol negotiation features. Then, at connection time, the client/server agree on what's the optimal way to send things -- perhaps the client can signal things like 'want compression for long datums'. The only case for a version flag at the data point level is if the server is sending version X at this tuple and version Y at that tuple.I don't think that's a makable case. Some might say,"what about a compression bit based on compressibility/length?" and to that I'd answer: why is that handling specific to the json type...are text/bytea/arrays not worth that feature too? merlin
pgsql-hackers by date: