Thread: A new JDBC driver...

A new JDBC driver...

From

Kevin Wooten

Date:

12 March 2013, 01:19:18

So… I did a crazy thing… I created an entirely new JDBC driver.

(For those who care there's an actual question at the bottom… this isn't just bragging)

About 10 days ago I set out to fix the current driver's support for composite and array objects. Currently they are only ever retrieved as string encoding due to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare. Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol (save for Copy In/Out).

I began by hacking the driver to force it into binary mode and pulling the raw data with getBytes. Then I started building a framework for recognizing, encoding and decoding all the possible types PostgreSQL knows about. Once that was working well, my mission was to rework this into the current driver. This proved almost impossible due to 1) my limited familiarity with the code and 2) the assumptions it makes about the formats of things like parameters. In the end it seemed too time consuming for ME to do this. So, partly for fun, I decided to just implement the FE/BE protocol and see where it got me. Next thing I knew I was running queries and retrieving data. Basically it's just a side project, of a retro-fit, that went wrong and has spiraled out of control ;)

Currently my "driver" (I use the term loosely as you can imagine it's state of compliance about 6 days of real work) looks like this:

* Basic support for JDBC interfaces (Driver, Connection, PreparedStatement, ResultSet)
* Can be discovered and loaded automatically through JDBC4 loader
* Supports ONLY JDBC4 interfaces (no attempt to compile to only JDBC2/3 is made)
* Lots of stuff missing, non-conformant or just plain broken
* Speaks "BINARY" format almost exclusively
* Extremely clean and simple interface for supporting new types; VERY DRY (only added in 1 place ever)
* Unsupported formats can fallback to "TEXT"
* Almost all of the current standard types are supported (even the hard ones like numerics, timestamps, dates)
* Can decode any recognized type to a Java object (this includes any imaginable composite or array type)
* Connection.setTypeMap and ResultSet.get(int idx, Map) are both fully supported
* Requests for composite objects that have no custom mapping are returned as HashMap
* Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])
* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)
* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
* LISTEN/NOTIFY and notifications can come through asynchronously
* Netty has a great system for managing buffers and reading/writing messages that shows increased speed
* Performance wasn't a goal of this project but it's a nice side effect
* Maven project
* Written against Java 7
* Obviously to enter any sort of real use the source version will have to be dialed down
* Shows some serious gains in performance
* Query and retrieve from the ResultSet a million rows with timestamps and it's about a 4-5x increase in speed
* Obviously some types there is no speed difference
* Again I say, performance was NOT a goal of this project it's just a nice side effect

BINARY SUPPORT
As outlined above I started the project to support decoding of Composite and Array types. To accomplish this I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection. I then create a type registry that holds all the required details of all the types. Procedures, for both TEXT and BINARY protocols, are looked up and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers". When a DataRow message is received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row. When sending parameter data the type is located and the it's encoder is called to encode the data. Reading through the driver mailing-lists, it seems using binary only has some ramifications as far as type coercion and such are concerned; currently all user initiated queries use the Extended Protocol & Statement Describe to ensure parameter types/values are correct.

Where to go from here…

The major question I would like to ask is…

Should I continue on this path of a new driver and see if people join or should I take what I have learned and try to refit it into the current code?

I am no dummy. I understand the years of experience the current driver has to ensure it works well in an extremely large number of cases. At the same time, anybody who has peeked around in there (and I have done quite a bit of it) knows its showing its age. My driver is 100% new code… not a stitch of the old was used. Give this, it seems like transplanting my new "core" into the current project would be like giving it a brain transplant just after a fresh head transplant; in other words… a rewrite.

I'd love it if some folks in the know could take a look at my code and see if it stirs up any ideas on integration or just makes you want to jump off a bridge.

If you read this far you get a cookie…

Here is the GitHub project… https://github.com/kdubb/pgjdbc-ng

Kevin