Re: SQLJSON - Mailing list pgsql-jdbc

From Steven Schlansker
Subject Re: SQLJSON
Date
Msg-id 427C3729-5331-4C71-8951-CA19631B8115@gmail.com
Whole thread Raw
In response to Re: SQLJSON  (Dave Cramer <pg@fastcrypt.com>)
List pgsql-jdbc
I apologize in advance if I've missed something, the volume of emails in this thread is getting ... a bit overwhelming.
My understanding of the situation is:

* Without JDBC5, we cannot expect our solution to be portable
* Without PG backend / protocol improvements, we cannot have our solution be faster or more efficient
* Without bloating the driver (and locking ourselves in to not receiving upstream bug fixes without re-rolling the
driver)or requiring external dependencies, we cannot have a nicer API to return JSON (and even if we do, see #1) 

Particularly in the case of using SQLJSON as a cursor-like construct to partially retrieve JSON values (did I
understandthat correctly?), it feels like we are rushing to implement something that cannot possibly be implemented
efficientlytoday, and if it does eventually get server / protocol support, our API is just as likely to be a total
mismatchas it is to mesh well. 

I would also like to point out that adding new methods to PGConnection is great, but as soon as you introduce a
connectionpool or other type of facade (which is, to my understanding, by far the common case) using any non-interface
methodsbecomes extremely confusing.  At the very best you have to deal with "unwrapping" any potentially wrapped
objects(Connection, ResultSet, Statement, etc) and worse you may find your pool does not bother to implement the
Wrapperinterface since it did not even exist until Java 6 and these tools move extremely slowly.  And it all breaks
whenyou switch connection pools and they have a different way of doing the wrapping... 

I wonder if it would not be more prudent to wait until JDBC has an API to argue about, or Postgres protocol supports at
leastone of the proposed features, rather than rushing to bake our own at the that we know has very little improvement
overwhat you can do out of the box today.  We are trying to build a feature from the middle out, rather than letting it
bedriven either top-down or bottom-up. 

I hope I haven't misunderstood anything here, I'm just having a very hard time figuring out the actual use case we are
tryingto solve from the large mash of emails.  To be clear, I say this as someone who has used the Postgres JSON
supportrelatively extensively, and would love to see improvements.  Maybe one of the advocates here could provide some
"pseudo-code"use cases?  A "story" for the proposal, if you will?  Sell me, make me wish for this feature :) 

To me the litmus test would be how much you can improve over the naive "end user ties Jackson and JDBC together"
approach,namely things such as 
rs = ps.execute("SELECT json_field FROM table WHERE condition = 42");
domainObject = mapper.readValue(rs.getBytes("json_field"), DomainObject.class);

either in terms of
* Code cleanliness that is more fundamental than a syntactic sugar helper method
 - For example a getJsonValue(rs, "json_field", DomainObject.class) shorthand
* Performance
 - Processing time
 - Memory usage
* Versatility
 - Not sure what people are missing here?  Except the SQLJSON cursor which we cannot possibly implement efficiently
today

Anyway, I'm really not trying to be a wet blanket here, I just want to make sure we all have a clear idea of what
problemswe are and are not solving by this proposed feature.  Don't want to stretch our already limited pg-jdbc
developmentresource even more thinly! 

On Jun 29, 2015, at 1:58 PM, Dave Cramer <pg@fastcrypt.com> wrote:

> On 29 June 2015 at 16:22, Markus KARG <markus@headcrashing.eu> wrote:
> Dave,
>
>
>
> I understand that people expect that getObject does something reasonable - but the case why this does not happen is
becausePostgreSQL does not provide something reasonable AFAIK, or is there JSON support in the protocol (I might have
missedthat, actually, then please just tell me)? 
>
>
>
> Well things in PostgreSQL  move at rather a glacial pace as far as features go. The emphasis is on reliability. If
youthink this mailing list is contentious you might want to have a look at hackers. To answer your question though
thereis no special backend support for JSON, or any other streaming.  Yes this is a performance drawback, but it is
whatit is.   
>
>
> But I need to tell you that you have to deal with the risk that if JDBC 5 comes up with the definition "getObject has
toreturn FancyJsonType" you'll face a backwards compatibility problem when you decided for pretty anything to return
butnot "FancyJsonType", whether the type you've choosen looked obvious to the common user, to almost any user or to
nobodyat all. Anyways, you're the boss, and I trust in you to do the right choice. :-) 
>
>
> Doing nothing is surely the wrong choice. The question is what to do. I don't pretend to know the right answer
here.Howeverin the absence of the spec, I'm sure whatever we do will be a compromise. 
>
>
>
> Regarding your metaphor, please in turn understand thar some guys in this form, like Vladimir and me, are
professionalcombustion engine engineers in our main job. Our whole life is dedicated to the difference between a
vanillaBMW 318 into an M3. I hope there is place for such engineers in your team, too, and it is _not_really_ your
attitudethat you are happy once the weels don't fall away once you start the motor. We do not do it for thank or cheer,
wesimply do it because it is the attitude of our main jobs. We cannot forget our knowledge about compression and
consumptionjust because 99% of the users don't care for. And part of this knowledge is thinking of efficiency and
safetyand cost of production and unassembly for recycling, and usefulness and maintainability and testability of the
car,too, even if no user _ever_ would think about when sitting is his ride. If our demands on the professionality and
performanceof the end product is not what you like to achieve and you like to simply replace the pipe with a chromed
one,then please let us know. Otherwise our PRs will one day produce a stack overflow on GitHub, which nobody wants. ;-) 
>
>
> As I said in my earlier email I am grateful for your contributions and I would love them to continue, but we need to
keepthe majority of the users in mind. We also need to make sure in our quest for performance that we don't break
anything.
>
> Regards,
>
> Dave
>
>
>
> Have fun
>
> -Markus
>
>
>
>
>
> From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Dave Cramer
> Sent: Montag, 29. Juni 2015 21:23
> To: Álvaro Hernández Tortosa
> Cc: List
> Subject: Re: [JDBC] SQLJSON
>
>
>
> Markus,
>
>
>
> I really value your recent input to the driver and would like it to continue but we need to keep things in
perspective.The mere fact that you are engaging in discussion on this list puts you in the 99.99999th (keep going with
9'sif you like) percentile of users. Most and I mean the rest of the people in that population just expect the driver
todo something reasonable. They don't provide JSON parsers or even know there is a difference. When they do getObject
andthat object is a json value they expect something magical to happen. We return an object they can use. 
>
>
>
> The other less than exciting fact about the driver is that it is not the central feature of PostgreSQL. Allow me a
metaphorif you will. I like cars so I'll chose to use a driveshaft. It connects the engine to the wheels. There is
nothingnew about driveshafts, although recently they have been made out of carbon fibre, but essentially they are tubes
whichconnect the crankshaft to the wheels. To continue the metaphor, when the driver applies the gas pedal they expect
thecar to move. If the driveshaft breaks then this doesn't happen. Our job is to make sure the driveshaft continues to
connectthe engine to the wheels. If we can make it out of carbon fibre so much the better, but it is a particularly
thanklessjob. 
>
>
>
>
>
> Kind Regards,
>
>
>
> Dave Cramer
>
> dave.cramer(at)credativ(dot)ca
> http://www.credativ.ca
>
>
>
> On 29 June 2015 at 15:01, Álvaro Hernández Tortosa <aht@8kdata.com> wrote:
>
>
>     Markus:
>
> On 29/06/15 18:26, Markus KARG wrote:
>
> Unfortunately, yes, it is weird to ask, as what you do actually is working already once you accept the technical
realitythat an application either has to use JSON SQL functions OR provide an implementation of the JSONP API to be
ableto process JsonObject. 
>
>
>     It was rhetorical. Precisely I'm saying let's include an implementation so that you could process a JsonValue. Of
course. 
>
>
>
> I actually cannot see what is so specific to PostgreSQL that pgjdbc has to support a use case that no other driver
supports.
>
>
>     Maybe that PostgreSQL has JSON, really good one, while others don't? Or let's ruin all the work done into JSON
justbecause you fear of an absolutely unrealistic class path? Really? Please.... 
>
>
>
>
>
> The argument against it is that it provides classpath clashes once pgjdbc is used in a Java EE environment, and that
oursolution might be incompatible with JDBC 5, and that our solution imposes work and complexity on the pgjdbc
contributorsjust for the sake to make you happy, without providing you any measurable benefit. 
>
>
>     Markus, when you say "to make you happy"....  please take back your words and stick to rational arguments. If you
cannot,please abandon this otherwise respectful and enriching debate. 
>
>     I'm speaking for all the users, all of them that want to use 9.4's best thing (with logical decoding permission,
ofcourse). 
>
>     Yet you only want to stop everything speaking of a class path that very likely no one will ever experience, or
somefears about JDBC5 which might be light years ahead. We will worry about JDBC5 then, but now we need to help our
users,not help them (with your help) to go to MongoDB. 
>
>
>
>
>     Álvaro
>
>
>
>
> --
> Álvaro Hernández Tortosa
>
>
> -----------
> 8Kdata
>
>
>
>
>
>
>
> -Markus
>
>
>
> From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Álvaro Hernández Tortosa
> Sent: Sonntag, 28. Juni 2015 21:06
> To: pgsql-jdbc@postgresql.org
> Subject: Re: [JDBC] SQLJSON
>
>
>
>
>
> On 28/06/15 17:09, Markus KARG wrote:
>
> If your application does not want to deal with JSON data then don't request it from the database, but use
http://www.postgresql.org/docs/9.4/static/functions-json.htmlin your SELECT statement. Performing a JSON extraction
functionnatively on the server and simply use the existing data types in your application not only reliefs us from
dealingwith JSON in the driver, but also will work magnitudes faster particularly on congested networks. That does work
bydefault already in the existing driver. What you ask for does not provide any benefit to the user from my view, or I
stilldo not understand you exact scenario. 
>
>
>     Markus, here you are trying to tell users what to do. I prefer to give them freedom, works best.
>
>     All I'm saying is:
>
> - Give users an API to get a javax.json.JsonValue out of a PostgreSQL column which is of type json, jsonb or the
resultof an expression which evaluates to any of those. 
>
> - Embed a JSON parser in the driver (like the RI implementation, which adds only 64Kb) so that users don't need to
loadany other code, unless they want to override the default JSON parser. 
>
>     From there, I don't care whether the user uses JSON functions at the server or the JSON API. What I want is that
itworks by default and that you can query JSON. Is this that weird to ask? 
>
>     I'm sorry, I fail to understand your solution to the problem. Would you please mind elaborating on the drawbacks
ofmy proposal? I stated in previous emails the drawbacks I see on not doing it, but I still fail to see an argument
againstthis. 
>
>     Thanks,
>
>     Álvaro
>
>
>
> --
> Álvaro Hernández Tortosa
>
>
> -----------
> 8Kdata
>
>
>
>
>
> From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Álvaro Hernández Tortosa
> Sent: Sonntag, 28. Juni 2015 11:57
> To: pgsql-jdbc@postgresql.org
> Subject: Re: [JDBC] SQLJSON
>
>
>
>
>
> On 28/06/15 11:49, Markus KARG wrote:
>
> You miss one essential point against bundling: An application author who wants to process JsonObject must have that
classon his classpath, even if he does not use JDBC at all. This essential is HIS problem as HE wants to process it. So
wheredoes he get that class from other than putting any JSR 253 implementation in his classpath? Again, it is simply
notpgjdbc's problem as in that scenario no JDBC is used at all. 
>
>     I don't agree, it's not a must. I may want to extract JSON data from the database and then manipulate it as
non-JSONdata. It all depends on your domain Objects. 
>
>     I already stated how bad for the user is not to have a driver that works by default. It may be as hard as adding
a64Kb to the driver. I don't understand how is this a problem, vs. the problem it creates for the user. 
>
>     Regards,
>
>     Álvaro
>
>
>
>
> --
> Álvaro Hernández Tortosa
>
>
> -----------
> 8Kdata
>
>
>
>
>
>
>
> From: Álvaro Hernández Tortosa [mailto:aht@8Kdata.com]
> Sent: Sonntag, 28. Juni 2015 11:41
> To: Christopher BROWN
> Cc: Sehrope Sarkuni; Markus KARG; Dave Cramer; List
> Subject: Re: [JDBC] SQLJSON
>
>
>
>
>
> On 28/06/15 09:34, Christopher BROWN wrote:
>
> Hello,
>
>
>
> Quick contribution, I'm not answering in-line because there are already too many in-line answers and it's getting
unreadable.
>
>     • In my own applications, I use Jackson, but it's many up of different ".jar" files and has therefore no
definitiveform (you can concoct lots of combinations).  It's also quite heavy in terms of footprint, and embedding it
makesno sense, because you'd have to keep updating the driver to keep up to date with Jackson. Finally, it doesn't
actuallyimplement JSR353 (although it would be possible to create a thin wrapper), out-of-the-box (via a compatibility
API)it can read JSR-353 but it basically rebuilds a Jackson representation out of a "standard" representation.  I might
chooseJackson, but I wouldn't want to impose it or require that it be bundled with the driver (indeed, that would cause
meclassloader issues as I often update to the latest version of Jackson). 
>     Although I mentioned Jackson as a good candidate for a default implementation, you are right it does not
implementJSR353 directly (although wrappers already exist, like https://github.com/pgelinas/jackson-javax-json). But it
doesnot need to be the default choice. I did a quick test and wrapping Jackson with jaackson-javax-json and the set of
dependenciesto make it work would add 1102Kb to the Jar file. Not much IMHO, although bigger than current driver size.
Iwould not be scared however to see a 2Mb jdbc jar file. 
>
>     However, the Reference Implementation (https://jsonp.java.net/) is probably good enough and only adds 64Kb to the
Jarfile. The JSR353 is just another 32Kb, so in total 96Kb would be added if using the RI rather than Jackson. I don't
considerthis offensive. 
>
>     In summary: why not bundle then the RI? Works out of the box and does not conflict with Jackson. Want to use
Jackson?Go for it. The rest of the world would have something running out-of-the-box. 
>
>
>
>     • You can compile the driver against the JSONP API without embedding either the interfaces or an implementation.
It'stherefore an optional feature for those that require it, and it's not rocket science to add the necessary APIs to
theclasspath. 
>     • I disagree that bundling interfaces + implementation is "making it easy".  For some users, perhaps, but for
others,you're going to cause headaches due to creating classloader conflicts (when it's already bundled in their
application).
>     Technically, it's possible. But from a user perspective, not bundling an implementation means:
>
> - Understanding why the code I wrote fails with a "Provider org.glassfish.json.JsonProviderImpl not found".
> - Google that. Find that you need to add the dependency and very likely create a SPI file
(META-INF/services/javax.json.spi.JsonProvider).
> - Blame PostgreSQL for not doing that by default and shipping a half-baked driver that is not finished (won't be my
opinion,but might be user's opinion). 
> - Google again to see what JSR353-compliant implementations are out there.
> - Blame PostgreSQL again for not making this choice for you.
> - Struggling to find a compatible implementation. Find Jackson but realize requires third-party driver. Question the
"quality"of that wrapper and consider whether that would be "supported" with PostgreSQL driver. 
> - Luckily enough a search points you to a StackOverflow link that suggests to use either this Jackson wrapper or the
ReferenceImplementation (there are not many JSR353 implementations, after all). 
> - Choose without knowing which JSON parser is best.
> - Bundle the dependency, check now it works. Clean your sweat.
> - Wonder why not to choose MongoDB next time, it works out of the box.
>
>     Not funny.
>
>     The alternative is: bundle a default (the RI) and let advanced users not happy with the default implementation to
makeanother choice (basically create the META-INF/services/javax.json.spi.JsonProvider and that will override the
bundledRI). You have the best of both worlds. 
>
>     Pareto (https://en.wikipedia.org/wiki/Pareto_principle): let's make it work for 80% and let that 20% to edit a
fileand "suffer" a 64Kb size increase in the driver, rather than ask everybody to go through the above process. 
>
>
>
>
>     • If as Dave Cramer says, the PG protocol doesn't currently support streaming, it still makes sense to add
streamingsupport that reads from already fully-read resources... because in that way, if the protocol improves in the
future,client code using the streaming API will benefit (with no changes to client code) in the future. 
>     JSR353 already has a Streaming API. I don't understand what do we need to do here, it's already done.
>
>     Best regards,
>
>     Álvaro
>
>
>
>
>
> --
> Álvaro Hernández Tortosa
>
>
> -----------
> 8Kdata
>
>
>
>
>
>
>
> --
>
> Christopher
>
>
>
>
>
> On 28 June 2015 at 01:53, Álvaro Hernández Tortosa <aht@8kdata.com> wrote:
>
>
>
> On 28/06/15 00:55, Sehrope Sarkuni wrote:
>
> On Sat, Jun 27, 2015 at 6:25 PM, Álvaro Hernández Tortosa <aht@8kdata.com> wrote:
>
>     Hi Sehrope!
>
>
>
> Hi Álvaro! :D
>
>
>
>     To me, this is the least important question. If based on JSR353's SPI, it's trivial to swap the default, included
one,for another one. Just picking a sensible default (Jackson, for instance) is probably good enough. 
>
>
>
> I think I've used Jackson almost every time I've had to deal with JSON in Java. The mapping API is pretty cool in
thatit lets you directly create an target object type. If we got the route of adding methods to PGResultSet then we
couldhave something like: <T> T getJsonAsType(String, Class<T> clazz) 
>
>
>     That might be a nice addition. But I believe that goes beyond driver's responsibility: I think it ends when it
returnsyou the JSON type you queried (JsonObject in my previous email, but I'm correcting now myself: JsonValue) 
>
>
>
>
>
>
> I'm not wedded to Jackson though. Honestly if JS353 is the standard then that's what we should be using. We'd still
needto figure out how to handle older JVMs or maybe just selectively disable the feature (JDK8+?). 
>
>
>     JSR353 is targeted for JavaSE 6 :)
>
>
>
>
>
>
>
> #2 is driven a lot by #1 as depending on the parser implementation there may be different object types returned. JSON
isa bit tricky as "valid JSON" can mean null, a scalar, an object, or an array. Most people thing of it as just an
objectbut "foobar" is valid JSON as well. This leads us to #3... 
>
>
>     The object type to return has to be IMHO JsonObject:
http://docs.oracle.com/javaee/7/api/javax/json/JsonObject.html
>
>
>
> Not always though. All these are valid JSON too:
>
>
>
> => SELECT '1'::json AS num, '"test"'::json AS string, '[1,2,3]'::json AS arr, '{"foo":"bar"}'::json AS obj;
>
>  num | string |   arr   |      obj
>
> -----+--------+---------+---------------
>
>  1   | "test" | [1,2,3] | {"foo":"bar"}
>
> (1 row)
>
>
>
> We'll need separate getters/setters for the scalar and array types as well. I agree that most people will just be
usingthe object type though (and maybe the array). 
>
>
>     You are right here. Please s/JsonObject/JsonValue/g JsonValue is a container for any of the above including
objectsand arrays. So it would be enough just with JsonValue getJsonValue(....) 
>
>
>
>
>
>
> #3 doesn't have a straight answer as there is no getJSON(...) methods in the JDBC spec. It'd probably have to be
returnedvia getObject(...). 
>
> An alternative is to provide PGResultSet and PGPreparedStatement classes similar to PGConnection that provides PG
extensions.They could have the get/set methods (ex: getJsonScalar(...) or setJsonObject(Map<String,Object> ...)) to
retrieveJSON values as specific object types (i.e. scalar, object, array). It'd be a bit more type safe as presumably
mostpeople using json/jsonb types know the top level type of what they're storing. 
>
>
>     Probably adding methods to PG classes would be better than getObject and force explicit casts. Regarding the
methods,if they simply return JsonObject, you already have a full API there to parse and extract and process. So
anythingthat returns a JsonObject from a column (identifier or #) would be enough for me. 
>
>
>
> For most cases I think it'd be fine. I think the custom mapping I mentioned above would cover the rest. Anything
beyondthat would be a full on transformation and would be very application specific. 
>
>
>     Yepp
>
>
>
>
>
>
> For #4 I see two possible wins. First off on the usability side, there's some convenience to natively interfacing
withjson/jsonb types. It'll only have value though if those types are the same ones that users are using in the rest of
theircode. If they're just using them as Map<String,Object> everywhere then it'd still be a pain for a user to convert
toour "native" PG JSON types to use via JDBC. Having a dedicated API that allows for interaction using native Java
typeswould make this more convenient. 
>
> The other win I can see for #4 is on performance. Right now JSON is converted to a String. That means everybody using
ithas to convert it twice. First raw bytes to String, then String to object. A dedicated API could cut one of those out
ofthe way. Given how the wire protocol is implemented in the driver, it wouldn't be a direct reading from the input
stream(it'll be buffered in a byte array), but at least it won't be copied twice. 
>
>
>     As far as I know, most users are using JsonObject, so returning that is a perfect match for pgjdbc. I don't
expecthowever big performance wins as JSON is sent as a String over the wire... 
>
>
>
> The performance gain isn't on the wire, it's from not having to convert bytes => String => JsonObject. It'd be bytes
=>JsonObject or bytes => CustomObject. Less work and less GC. The bigger the JSON string, the bigger the savings too. 
>
>
>     You are right in that JSR353 allows you to create a parser directly out of an InputStream, so you would avoid
convertingto String. That's a win. The rest of the conversions are inevitable (having the latter one you pointed out
layingin user's realm, beyond driver's responsibility). 
>
>     Regards,
>
>     Álvaro
>
>
>
>
> --
> Álvaro Hernández Tortosa
>
>
> -----------
> 8Kdata
>
>
>
>
>
>
>
>
>
>
>
>
>
>



pgsql-jdbc by date:

Previous
From: Dave Cramer
Date:
Subject: Re: SQLJSON
Next
From: Steven Schlansker
Date:
Subject: Re: SQLJSON