Re: A new JDBC driver... - Mailing list pgsql-jdbc
From | Kevin Wooten |
---|---|
Subject | Re: A new JDBC driver... |
Date | |
Msg-id | 47737654-1645-4B3F-8FDF-0156A289A2AF@me.com Whole thread Raw |
In response to | Re: A new JDBC driver... (Dave Cramer <pg@fastcrypt.com>) |
List | pgsql-jdbc |
If you remember what I said, I referred to its age as both a positive and a negative. I understand that all of this special case code is what makes things work in the real world. That being said, here are some things I can remember off the top of my head...
Take the simple case of "getString" in the ResultSet. It's full of special case code, complete with a nice comment about a "hack to be compatible with text protocol" that has something to do with Dates; It even specifically references the "hstore" type by name. In addition it seems there can be up to 5 hashmap lookups, to get the type when going through getSting depending on the actual type of the column. To truly understand what this function does is a fun task as it's used in quite a few places.
The getSQLType function has comments about not being able to truly know which types are arrays and which are not. In comparison the new code doesn't care because the object is converted to an array by a decoder. I won't forget to mention that the current code is EXTREMELY slow when it comes to using getDatabaseMetaData and getMetaData calls; I mention this because I have a pull request from months ago that fixes the performance issue that hasn't been integrated yet ;)
On Mar 12, 2013, at 2:51 AM, Dave Cramer <pg@fastcrypt.com> wrote:
Kevin,I think this is a very interesting idea. I've often thought that writing a stripped down version of the driver that did the basics *very* well would be an interesting project.I'm wondering how to integrate this into the official project to live beside the current driver, and possibly supersede it ?You mention "it is showing it's age" can you be more specific ?Regards,On Mon, Mar 11, 2013 at 9:19 PM, Kevin Wooten <kdubb@me.com> wrote:So… I did a crazy thing… I created an entirely new JDBC driver.
(For those who care there's an actual question at the bottom… this isn't just bragging)
About 10 days ago I set out to fix the current driver's support for composite and array objects. Currently they are only ever retrieved as string encoding due to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare. Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol (save for Copy In/Out).
I began by hacking the driver to force it into binary mode and pulling the raw data with getBytes. Then I started building a framework for recognizing, encoding and decoding all the possible types PostgreSQL knows about. Once that was working well, my mission was to rework this into the current driver. This proved almost impossible due to 1) my limited familiarity with the code and 2) the assumptions it makes about the formats of things like parameters. In the end it seemed too time consuming for ME to do this. So, partly for fun, I decided to just implement the FE/BE protocol and see where it got me. Next thing I knew I was running queries and retrieving data. Basically it's just a side project, of a retro-fit, that went wrong and has spiraled out of control ;)
Currently my "driver" (I use the term loosely as you can imagine it's state of compliance about 6 days of real work) looks like this:
* Basic support for JDBC interfaces (Driver, Connection, PreparedStatement, ResultSet)
* Can be discovered and loaded automatically through JDBC4 loader
* Supports ONLY JDBC4 interfaces (no attempt to compile to only JDBC2/3 is made)
* Lots of stuff missing, non-conformant or just plain broken
* Speaks "BINARY" format almost exclusively
* Extremely clean and simple interface for supporting new types; VERY DRY (only added in 1 place ever)
* Unsupported formats can fallback to "TEXT"
* Almost all of the current standard types are supported (even the hard ones like numerics, timestamps, dates)
* Can decode any recognized type to a Java object (this includes any imaginable composite or array type)
* Connection.setTypeMap and ResultSet.get(int idx, Map) are both fully supported
* Requests for composite objects that have no custom mapping are returned as HashMap
* Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])
* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)
* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
* LISTEN/NOTIFY and notifications can come through asynchronously
* Netty has a great system for managing buffers and reading/writing messages that shows increased speed
* Performance wasn't a goal of this project but it's a nice side effect
* Maven project
* Written against Java 7
* Obviously to enter any sort of real use the source version will have to be dialed down
* Shows some serious gains in performance
* Query and retrieve from the ResultSet a million rows with timestamps and it's about a 4-5x increase in speed
* Obviously some types there is no speed difference
* Again I say, performance was NOT a goal of this project it's just a nice side effect
BINARY SUPPORT
As outlined above I started the project to support decoding of Composite and Array types. To accomplish this I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection. I then create a type registry that holds all the required details of all the types. Procedures, for both TEXT and BINARY protocols, are looked up and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers". When a DataRow message is received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row. When sending parameter data the type is located and the it's encoder is called to encode the data. Reading through the driver mailing-lists, it seems using binary only has some ramifications as far as type coercion and such are concerned; currently all user initiated queries use the Extended Protocol & Statement Describe to ensure parameter types/values are correct.
Where to go from here…
The major question I would like to ask is…
Should I continue on this path of a new driver and see if people join or should I take what I have learned and try to refit it into the current code?
I am no dummy. I understand the years of experience the current driver has to ensure it works well in an extremely large number of cases. At the same time, anybody who has peeked around in there (and I have done quite a bit of it) knows its showing its age. My driver is 100% new code… not a stitch of the old was used. Give this, it seems like transplanting my new "core" into the current project would be like giving it a brain transplant just after a fresh head transplant; in other words… a rewrite.
I'd love it if some folks in the know could take a look at my code and see if it stirs up any ideas on integration or just makes you want to jump off a bridge.
If you read this far you get a cookie…
Here is the GitHub project… https://github.com/kdubb/pgjdbc-ng
Kevin
pgsql-jdbc by date: