Thread: A new JDBC driver...

A new JDBC driver...

From
Kevin Wooten
Date:
So… I did a crazy thing… I created an entirely new JDBC driver.

(For those who care there's an actual question at the bottom… this isn't just bragging)

About 10 days ago I set out to fix the current driver's support for composite and array objects.  Currently they are only ever retrieved as string encoding due to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare.  Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol (save for Copy In/Out).

I began by hacking the driver to force it into binary mode and pulling the raw data with getBytes. Then I started building a framework for recognizing, encoding and decoding all the possible types PostgreSQL knows about.   Once that was working well, my mission was to rework this into the current driver.  This proved almost impossible due to 1) my limited familiarity with the code and 2) the assumptions it makes about the formats of things like parameters.  In the end it seemed too time consuming for ME to do this.  So, partly for fun, I decided to just implement the FE/BE protocol and see where it got me. Next thing I knew I was running queries and retrieving data.  Basically it's just a side project, of a retro-fit, that went wrong and has spiraled out of control ;)

Currently my "driver" (I use the term loosely as you can imagine it's state of compliance about 6 days of real work) looks like this:

* Basic support for JDBC interfaces (Driver, Connection, PreparedStatement, ResultSet)
* Can be discovered and loaded automatically through JDBC4 loader
* Supports ONLY JDBC4 interfaces (no attempt to compile to only JDBC2/3 is made)
* Lots of stuff missing, non-conformant or just plain broken
* Speaks "BINARY" format almost exclusively
* Extremely clean and simple interface for supporting new types; VERY DRY (only added in 1 place ever)
* Unsupported formats can fallback to "TEXT"
* Almost all of the current standard types are supported (even the hard ones like numerics, timestamps, dates)
* Can decode any recognized type to a Java object (this includes any imaginable composite or array type)
* Connection.setTypeMap and ResultSet.get(int idx, Map) are both fully supported
* Requests for composite objects that have no custom mapping are returned as HashMap
* Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])
* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)
* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
* LISTEN/NOTIFY and notifications can come through asynchronously
* Netty has a great system for managing buffers and reading/writing messages that shows increased speed
* Performance wasn't a goal of this project but it's a nice side effect
* Maven project
* Written against Java 7
* Obviously to enter any sort of real use the source version will have to be dialed down
* Shows some serious gains in performance
* Query and retrieve from the ResultSet a million rows with timestamps and it's about a 4-5x increase in speed
* Obviously some types there is no speed difference
* Again I say, performance was NOT a goal of this project it's just a nice side effect

BINARY SUPPORT
As outlined above I started the project to support decoding of Composite and Array types.  To accomplish this I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection.  I then create a type registry that holds all the required details of all the types.  Procedures, for both TEXT and BINARY protocols, are looked up and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers".  When a DataRow message is received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row.  When sending parameter data the type is located and the it's encoder is called to encode the data. Reading through the driver mailing-lists, it seems using binary only has some ramifications as far as type coercion and such are concerned; currently all user initiated queries use the Extended Protocol & Statement Describe to ensure parameter types/values are correct.

Where to go from here…

The major question I would like to ask is… 

Should I continue on this path of a new driver and see if people join or should I take what I have learned and try to refit it into the current code?

I am no dummy.  I understand the years of experience the current driver has to ensure it works well in an extremely large number of cases.  At the same time, anybody who has peeked around in there (and I have done quite a bit of it) knows its showing its age.   My driver is 100% new code… not a stitch of the old was used.  Give this, it seems like transplanting my new "core" into the current project would be like giving it a brain transplant just after a fresh head transplant; in other words… a rewrite.

I'd love it if some folks in the know could take a look at my code and see if it stirs up any ideas on integration or just makes you want to jump off a bridge.

If you read this far you get a cookie…

Here is the GitHub project…  https://github.com/kdubb/pgjdbc-ng

Kevin

Re: A new JDBC driver...

From
Dave Cramer
Date:
Kevin,

I think this is a very interesting idea. I've often thought that writing a stripped down version of the driver that did the basics *very* well would be an interesting project.

I'm wondering how to integrate this into the official project to live beside the current driver, and possibly supersede it ?

You mention "it is showing it's age" can you be more specific ?

Regards,

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Mon, Mar 11, 2013 at 9:19 PM, Kevin Wooten <kdubb@me.com> wrote:
So… I did a crazy thing… I created an entirely new JDBC driver.

(For those who care there's an actual question at the bottom… this isn't just bragging)

About 10 days ago I set out to fix the current driver's support for composite and array objects.  Currently they are only ever retrieved as string encoding due to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare.  Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol (save for Copy In/Out).

I began by hacking the driver to force it into binary mode and pulling the raw data with getBytes. Then I started building a framework for recognizing, encoding and decoding all the possible types PostgreSQL knows about.   Once that was working well, my mission was to rework this into the current driver.  This proved almost impossible due to 1) my limited familiarity with the code and 2) the assumptions it makes about the formats of things like parameters.  In the end it seemed too time consuming for ME to do this.  So, partly for fun, I decided to just implement the FE/BE protocol and see where it got me. Next thing I knew I was running queries and retrieving data.  Basically it's just a side project, of a retro-fit, that went wrong and has spiraled out of control ;)

Currently my "driver" (I use the term loosely as you can imagine it's state of compliance about 6 days of real work) looks like this:

* Basic support for JDBC interfaces (Driver, Connection, PreparedStatement, ResultSet)
* Can be discovered and loaded automatically through JDBC4 loader
* Supports ONLY JDBC4 interfaces (no attempt to compile to only JDBC2/3 is made)
* Lots of stuff missing, non-conformant or just plain broken
* Speaks "BINARY" format almost exclusively
* Extremely clean and simple interface for supporting new types; VERY DRY (only added in 1 place ever)
* Unsupported formats can fallback to "TEXT"
* Almost all of the current standard types are supported (even the hard ones like numerics, timestamps, dates)
* Can decode any recognized type to a Java object (this includes any imaginable composite or array type)
* Connection.setTypeMap and ResultSet.get(int idx, Map) are both fully supported
* Requests for composite objects that have no custom mapping are returned as HashMap
* Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])
* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)
* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
* LISTEN/NOTIFY and notifications can come through asynchronously
* Netty has a great system for managing buffers and reading/writing messages that shows increased speed
* Performance wasn't a goal of this project but it's a nice side effect
* Maven project
* Written against Java 7
* Obviously to enter any sort of real use the source version will have to be dialed down
* Shows some serious gains in performance
* Query and retrieve from the ResultSet a million rows with timestamps and it's about a 4-5x increase in speed
* Obviously some types there is no speed difference
* Again I say, performance was NOT a goal of this project it's just a nice side effect

BINARY SUPPORT
As outlined above I started the project to support decoding of Composite and Array types.  To accomplish this I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection.  I then create a type registry that holds all the required details of all the types.  Procedures, for both TEXT and BINARY protocols, are looked up and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers".  When a DataRow message is received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row.  When sending parameter data the type is located and the it's encoder is called to encode the data. Reading through the driver mailing-lists, it seems using binary only has some ramifications as far as type coercion and such are concerned; currently all user initiated queries use the Extended Protocol & Statement Describe to ensure parameter types/values are correct.

Where to go from here…

The major question I would like to ask is… 

Should I continue on this path of a new driver and see if people join or should I take what I have learned and try to refit it into the current code?

I am no dummy.  I understand the years of experience the current driver has to ensure it works well in an extremely large number of cases.  At the same time, anybody who has peeked around in there (and I have done quite a bit of it) knows its showing its age.   My driver is 100% new code… not a stitch of the old was used.  Give this, it seems like transplanting my new "core" into the current project would be like giving it a brain transplant just after a fresh head transplant; in other words… a rewrite.

I'd love it if some folks in the know could take a look at my code and see if it stirs up any ideas on integration or just makes you want to jump off a bridge.

If you read this far you get a cookie…

Here is the GitHub project…  https://github.com/kdubb/pgjdbc-ng

Kevin

Re: A new JDBC driver...

From
John Lister
Date:
Well done, I started a similar project a few years ago (but time constraints got the better of me) for the same reason: better support of arrays. I ended up modifying my own version of the driver, I don't think my patches were ever "approved"

Looking forward to giving it a go.

John

On Mon, Mar 11, 2013 at 9:19 PM, Kevin Wooten <kdubb@me.com> wrote:
So… I did a crazy thing… I created an entirely new JDBC driver.

(For those who care there's an actual question at the bottom… this isn't just bragging)

About 10 days ago I set out to fix the current driver's support for composite and array objects.  Currently they are only ever retrieved as string encoding due to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare.  Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol (save for Copy In/Out).

I began by hacking the driver to force it into binary mode and pulling the raw data with getBytes. Then I started building a framework for recognizing, encoding and decoding all the possible types PostgreSQL knows about.   Once that was working well, my mission was to rework this into the current driver.  This proved almost impossible due to 1) my limited familiarity with the code and 2) the assumptions it makes about the formats of things like parameters.  In the end it seemed too time consuming for ME to do this.  So, partly for fun, I decided to just implement the FE/BE protocol and see where it got me. Next thing I knew I was running queries and retrieving data.  Basically it's just a side project, of a retro-fit, that went wrong and has spiraled out of control ;)

Currently my "driver" (I use the term loosely as you can imagine it's state of compliance about 6 days of real work) looks like this:

* Basic support for JDBC interfaces (Driver, Connection, PreparedStatement, ResultSet)
* Can be discovered and loaded automatically through JDBC4 loader
* Supports ONLY JDBC4 interfaces (no attempt to compile to only JDBC2/3 is made)
* Lots of stuff missing, non-conformant or just plain broken
* Speaks "BINARY" format almost exclusively
* Extremely clean and simple interface for supporting new types; VERY DRY (only added in 1 place ever)
* Unsupported formats can fallback to "TEXT"
* Almost all of the current standard types are supported (even the hard ones like numerics, timestamps, dates)
* Can decode any recognized type to a Java object (this includes any imaginable composite or array type)
* Connection.setTypeMap and ResultSet.get(int idx, Map) are both fully supported
* Requests for composite objects that have no custom mapping are returned as HashMap
* Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])
* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)
* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
* LISTEN/NOTIFY and notifications can come through asynchronously
* Netty has a great system for managing buffers and reading/writing messages that shows increased speed
* Performance wasn't a goal of this project but it's a nice side effect
* Maven project
* Written against Java 7
* Obviously to enter any sort of real use the source version will have to be dialed down
* Shows some serious gains in performance
* Query and retrieve from the ResultSet a million rows with timestamps and it's about a 4-5x increase in speed
* Obviously some types there is no speed difference
* Again I say, performance was NOT a goal of this project it's just a nice side effect

BINARY SUPPORT
As outlined above I started the project to support decoding of Composite and Array types.  To accomplish this I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection.  I then create a type registry that holds all the required details of all the types.  Procedures, for both TEXT and BINARY protocols, are looked up and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers".  When a DataRow message is received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row.  When sending parameter data the type is located and the it's encoder is called to encode the data. Reading through the driver mailing-lists, it seems using binary only has some ramifications as far as type coercion and such are concerned; currently all user initiated queries use the Extended Protocol & Statement Describe to ensure parameter types/values are correct.

Where to go from here…

The major question I would like to ask is… 

Should I continue on this path of a new driver and see if people join or should I take what I have learned and try to refit it into the current code?

I am no dummy.  I understand the years of experience the current driver has to ensure it works well in an extremely large number of cases.  At the same time, anybody who has peeked around in there (and I have done quite a bit of it) knows its showing its age.   My driver is 100% new code… not a stitch of the old was used.  Give this, it seems like transplanting my new "core" into the current project would be like giving it a brain transplant just after a fresh head transplant; in other words… a rewrite.

I'd love it if some folks in the know could take a look at my code and see if it stirs up any ideas on integration or just makes you want to jump off a bridge.

If you read this far you get a cookie…

Here is the GitHub project…  https://github.com/kdubb/pgjdbc-ng

-- 
Get the PriceGoblin Browser Addon
www.pricegoblin.co.uk

Re: A new JDBC driver...

From
Patric Bechtel
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Kevin,

I'm excited to try it out - I've been thinking on doing the same quite a few times while hacking
yet another hotspot out of the driver, and you're right: It's showing it's age, but also it's
experience.
My main motivation was speed so far and I'm glad to see netty used - nice approach.

I'm looking forward trying it out and lend a hand if I have the time.

Thanks for sharing this,

Kevin Wooten schrieb am 12.03.2013 02:19:
> So… I did a crazy thing… I created an entirely new JDBC driver.
[...]
> If you read this far you get a cookie…
>
> Here is the GitHub project…  https://github.com/kdubb/pgjdbc-ng
>
> Kevin

- --
cu, Patric
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: GnuPT 2.5.2

iEYEARECAAYFAlE/ElwACgkQfGgGu8y7ypCrowCfRWWnJqldLZhktoggvjRMjNA7
EeIAnjc5SJH3FLpOLf0B+KG9pGa/e+8d
=bV28
-----END PGP SIGNATURE-----


Re: A new JDBC driver...

From
Mikko Tiihonen
Date:
Excellent progress Kevin!

I agree that it adding new low-level functionality functionality (such as new transport layer) and other optimizations to current JDBC code can be too difficult, making rewrite the only feasible way forward.

I like your usage of Netty as the IO layer. It really allows us to experiment in the future much more with the API we provide to applications. I also like the usage of the binary protocol whenever possible.

Some ideas I have had (in addition to the ones you have already implemented):
- create an asynchronous API for JDBC extensions with futures or asynchronous result row processing
- add pipelining to connections thus allowing better use of connections (works best with simple auto-commit queries)
- integrate async api to vert.x
- integrate async api to Akka

I will definitely have a more detailed look at the code.

-Mikko

Re: A new JDBC driver...

From
Dave Cramer
Date:


On Tue, Mar 12, 2013 at 7:32 AM, Patric Bechtel <bechtel@ipcon.de> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Kevin,

I'm excited to try it out - I've been thinking on doing the same quite a few times while hacking
yet another hotspot out of the driver, and you're right: It's showing it's age, but also it's
experience.


Can you be more specific about the issues with "experience" 



Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


Re: A new JDBC driver...

From
Thomas Kellerer
Date:
Kevin Wooten wrote on 12.03.2013 02:19:
> So… I did a crazy thing… I created an entirely new JDBC driver.
>
> (For those who care there's an actual question at the bottom… this
> isn't just bragging)
>
> I'd love it if some folks in the know could take a look at my code
> and see if it stirs up any ideas on integration or just makes you
> want to jump off a bridge.
>

It does sound very nice, thanks very much for the effort!

I didn't look at the code, but one thing I noticed when trying the driver is that it requires a bunch of other
librarieswhich is highly unusual in the JDBC "world". Usually the JDBC driver jar is self contained. 

When I tried it with my SQL tool (SQL Workbench/J) I got the following exception when connecting:


java.lang.NullPointerException
   at com.impossibl.postgres.protocol.v30.ProtocolImpl.sendStartup(ProtocolImpl.java:168)
   at com.impossibl.postgres.protocol.v30.StartupCommandImpl.execute(StartupCommandImpl.java:107)
   at com.impossibl.postgres.protocol.v30.ProtocolImpl.execute(ProtocolImpl.java:139)
   at com.impossibl.postgres.system.BasicContext.start(BasicContext.java:151)
   at com.impossibl.postgres.system.BasicContext.init(BasicContext.java:110)
   at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:56)
   at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:22)

Regards
Thomas

Re: A new JDBC driver...

From
Dave Cramer
Date:
Ya, It occurred to me as well if you are using netty it requires another jar.

A bit unorthodox...

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Tue, Mar 12, 2013 at 1:39 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:
Kevin Wooten wrote on 12.03.2013 02:19:
So… I did a crazy thing… I created an entirely new JDBC driver.

(For those who care there's an actual question at the bottom… this
isn't just bragging)

I'd love it if some folks in the know could take a look at my code
and see if it stirs up any ideas on integration or just makes you
want to jump off a bridge.


It does sound very nice, thanks very much for the effort!

I didn't look at the code, but one thing I noticed when trying the driver is that it requires a bunch of other libraries which is highly unusual in the JDBC "world". Usually the JDBC driver jar is self contained.

When I tried it with my SQL tool (SQL Workbench/J) I got the following exception when connecting:


java.lang.NullPointerException
  at com.impossibl.postgres.protocol.v30.ProtocolImpl.sendStartup(ProtocolImpl.java:168)
  at com.impossibl.postgres.protocol.v30.StartupCommandImpl.execute(StartupCommandImpl.java:107)
  at com.impossibl.postgres.protocol.v30.ProtocolImpl.execute(ProtocolImpl.java:139)
  at com.impossibl.postgres.system.BasicContext.start(BasicContext.java:151)
  at com.impossibl.postgres.system.BasicContext.init(BasicContext.java:110)
  at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:56)
  at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:22)

Regards
Thomas




--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc

Re: A new JDBC driver...

From
Steven Schlansker
Date:
On Mar 12, 2013, at 11:04 AM, Dave Cramer <pg@fastcrypt.com> wrote:

> Ya, It occurred to me as well if you are using netty it requires another jar.
>
> A bit unorthodox...
>
> Dave Cramer
>
> dave.cramer(at)credativ(dot)ca
> http://www.credativ.ca
>

To my understanding, with a Maven build process it is relatively easy to produce both a
"unmodified" jar which is appropriate for use in Maven projects and a "shaded" jar which could
be appropriate for use in tools / non-Maven-aware projects:

http://maven.apache.org/plugins/maven-shade-plugin/

This way you can have the best of both worlds.

>
> On Tue, Mar 12, 2013 at 1:39 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:
> Kevin Wooten wrote on 12.03.2013 02:19:
> So… I did a crazy thing… I created an entirely new JDBC driver.
>
> (For those who care there's an actual question at the bottom… this
> isn't just bragging)
>
> I'd love it if some folks in the know could take a look at my code
> and see if it stirs up any ideas on integration or just makes you
> want to jump off a bridge.
>
>
> It does sound very nice, thanks very much for the effort!
>
> I didn't look at the code, but one thing I noticed when trying the driver is that it requires a bunch of other
librarieswhich is highly unusual in the JDBC "world". Usually the JDBC driver jar is self contained. 
>
> When I tried it with my SQL tool (SQL Workbench/J) I got the following exception when connecting:
>
>
> java.lang.NullPointerException
>   at com.impossibl.postgres.protocol.v30.ProtocolImpl.sendStartup(ProtocolImpl.java:168)
>   at com.impossibl.postgres.protocol.v30.StartupCommandImpl.execute(StartupCommandImpl.java:107)
>   at com.impossibl.postgres.protocol.v30.ProtocolImpl.execute(ProtocolImpl.java:139)
>   at com.impossibl.postgres.system.BasicContext.start(BasicContext.java:151)
>   at com.impossibl.postgres.system.BasicContext.init(BasicContext.java:110)
>   at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:56)
>   at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:22)
>
> Regards
> Thomas
>
>
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc
>



Re: A new JDBC driver...

From
Mikko Tiihonen
Date:
On 03/12/2013 08:04 PM, Dave Cramer wrote:
> Ya, It occurred to me as well if you are using netty it requires another jar.
>
> A bit unorthodox...

I think using 3rd party dependencies is the proper way to go forward instead of reimplementing all required
functionalitiesin every JDBC driver. 
That is not to say that the required dependencies should be kept to minimum.

Normal java applications should use pgjdbc-ng through maven dependencies (or similar).
For cases where that is not possible, such as SQL editors we should generate also provide a fat jdbc jar that has all
therequired dependencies embedded. 

-Mikko

> Dave Cramer
>
> dave.cramer(at)credativ(dot)ca
> http://www.credativ.ca
>
>
> On Tue, Mar 12, 2013 at 1:39 PM, Thomas Kellerer <spam_eater@gmx.net <mailto:spam_eater@gmx.net>> wrote:
>
>     Kevin Wooten wrote on 12.03.2013 02:19:
>
>         So… I did a crazy thing… I created an entirely new JDBC driver.
>
>         (For those who care there's an actual question at the bottom… this
>         isn't just bragging)
>
>         I'd love it if some folks in the know could take a look at my code
>         and see if it stirs up any ideas on integration or just makes you
>         want to jump off a bridge.
>
>
>     It does sound very nice, thanks very much for the effort!
>
>     I didn't look at the code, but one thing I noticed when trying the driver is that it requires a bunch of other
librarieswhich is highly unusual in the JDBC 
>     "world". Usually the JDBC driver jar is self contained.
>
>     When I tried it with my SQL tool (SQL Workbench/J) I got the following exception when connecting:
>
>
>     java.lang.NullPointerException
>        at com.impossibl.postgres.__protocol.v30.ProtocolImpl.__sendStartup(ProtocolImpl.java:__168)
>        at com.impossibl.postgres.__protocol.v30.__StartupCommandImpl.execute(__StartupCommandImpl.java:107)
>        at com.impossibl.postgres.__protocol.v30.ProtocolImpl.__execute(ProtocolImpl.java:139)
>        at com.impossibl.postgres.system.__BasicContext.start(__BasicContext.java:151)
>        at com.impossibl.postgres.system.__BasicContext.init(__BasicContext.java:110)
>        at com.impossibl.postgres.jdbc.__PSQLDriver.connect(PSQLDriver.__java:56)
>        at com.impossibl.postgres.jdbc.__PSQLDriver.connect(PSQLDriver.__java:22)
>
>     Regards
>     Thomas
>
>
>
>
>     --
>     Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org <mailto:pgsql-jdbc@postgresql.org>)
>     To make changes to your subscription:
>     http://www.postgresql.org/__mailpref/pgsql-jdbc <http://www.postgresql.org/mailpref/pgsql-jdbc>
>
>



Re: A new JDBC driver...

From
Kevin Wooten
Date:
If you remember what I said, I referred to its age as both a positive and a negative.  I understand that all of this special case code is what makes things work in the real world.  That being said, here are some things I can remember off the top of my head...  

Take the simple case of "getString" in the ResultSet.  It's full of special case code, complete with a nice comment about a "hack to be compatible with text protocol" that has something to do with Dates; It even specifically references the "hstore" type by name.  In addition it seems there can be up to 5 hashmap lookups, to get the type when going through getSting depending on the actual type of the column.  To truly understand what this function does is a fun task as it's used in quite a few places.

The getSQLType function has comments about not being able to truly know which types are arrays and which are not.  In comparison the new code doesn't care because the object is converted to an array by a decoder.   I won't forget to mention that the current code is EXTREMELY slow when it comes to using getDatabaseMetaData and getMetaData calls; I mention this because I have a pull request from months ago that fixes the performance issue that hasn't been integrated yet ;)


On Mar 12, 2013, at 2:51 AM, Dave Cramer <pg@fastcrypt.com> wrote:

Kevin,

I think this is a very interesting idea. I've often thought that writing a stripped down version of the driver that did the basics *very* well would be an interesting project.

I'm wondering how to integrate this into the official project to live beside the current driver, and possibly supersede it ?

You mention "it is showing it's age" can you be more specific ?

Regards,

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Mon, Mar 11, 2013 at 9:19 PM, Kevin Wooten <kdubb@me.com> wrote:
So… I did a crazy thing… I created an entirely new JDBC driver.

(For those who care there's an actual question at the bottom… this isn't just bragging)

About 10 days ago I set out to fix the current driver's support for composite and array objects.  Currently they are only ever retrieved as string encoding due to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare.  Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol (save for Copy In/Out).

I began by hacking the driver to force it into binary mode and pulling the raw data with getBytes. Then I started building a framework for recognizing, encoding and decoding all the possible types PostgreSQL knows about.   Once that was working well, my mission was to rework this into the current driver.  This proved almost impossible due to 1) my limited familiarity with the code and 2) the assumptions it makes about the formats of things like parameters.  In the end it seemed too time consuming for ME to do this.  So, partly for fun, I decided to just implement the FE/BE protocol and see where it got me. Next thing I knew I was running queries and retrieving data.  Basically it's just a side project, of a retro-fit, that went wrong and has spiraled out of control ;)

Currently my "driver" (I use the term loosely as you can imagine it's state of compliance about 6 days of real work) looks like this:

* Basic support for JDBC interfaces (Driver, Connection, PreparedStatement, ResultSet)
* Can be discovered and loaded automatically through JDBC4 loader
* Supports ONLY JDBC4 interfaces (no attempt to compile to only JDBC2/3 is made)
* Lots of stuff missing, non-conformant or just plain broken
* Speaks "BINARY" format almost exclusively
* Extremely clean and simple interface for supporting new types; VERY DRY (only added in 1 place ever)
* Unsupported formats can fallback to "TEXT"
* Almost all of the current standard types are supported (even the hard ones like numerics, timestamps, dates)
* Can decode any recognized type to a Java object (this includes any imaginable composite or array type)
* Connection.setTypeMap and ResultSet.get(int idx, Map) are both fully supported
* Requests for composite objects that have no custom mapping are returned as HashMap
* Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])
* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)
* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
* LISTEN/NOTIFY and notifications can come through asynchronously
* Netty has a great system for managing buffers and reading/writing messages that shows increased speed
* Performance wasn't a goal of this project but it's a nice side effect
* Maven project
* Written against Java 7
* Obviously to enter any sort of real use the source version will have to be dialed down
* Shows some serious gains in performance
* Query and retrieve from the ResultSet a million rows with timestamps and it's about a 4-5x increase in speed
* Obviously some types there is no speed difference
* Again I say, performance was NOT a goal of this project it's just a nice side effect

BINARY SUPPORT
As outlined above I started the project to support decoding of Composite and Array types.  To accomplish this I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection.  I then create a type registry that holds all the required details of all the types.  Procedures, for both TEXT and BINARY protocols, are looked up and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers".  When a DataRow message is received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row.  When sending parameter data the type is located and the it's encoder is called to encode the data. Reading through the driver mailing-lists, it seems using binary only has some ramifications as far as type coercion and such are concerned; currently all user initiated queries use the Extended Protocol & Statement Describe to ensure parameter types/values are correct.

Where to go from here…

The major question I would like to ask is… 

Should I continue on this path of a new driver and see if people join or should I take what I have learned and try to refit it into the current code?

I am no dummy.  I understand the years of experience the current driver has to ensure it works well in an extremely large number of cases.  At the same time, anybody who has peeked around in there (and I have done quite a bit of it) knows its showing its age.   My driver is 100% new code… not a stitch of the old was used.  Give this, it seems like transplanting my new "core" into the current project would be like giving it a brain transplant just after a fresh head transplant; in other words… a rewrite.

I'd love it if some folks in the know could take a look at my code and see if it stirs up any ideas on integration or just makes you want to jump off a bridge.

If you read this far you get a cookie…

Here is the GitHub project…  https://github.com/kdubb/pgjdbc-ng

Kevin


Re: A new JDBC driver...

From
Kevin Wooten
Date:
I don't expect it to work with SQL workbench but after looking at this, it cannot find the property for "username",
"password"or "database". 

"database" is properly gleaned from the URL.
"username" and "password" can be provided as extra properties or as URL query parameters.

What URL are you using to connect and what properties are being provided?

As an aside, it currently only supports "plain" and "md5" authentication methods.

On Mar 12, 2013, at 10:39 AM, Thomas Kellerer <spam_eater@gmx.net> wrote:

> Kevin Wooten wrote on 12.03.2013 02:19:
>> So… I did a crazy thing… I created an entirely new JDBC driver.
>>
>> (For those who care there's an actual question at the bottom… this
>> isn't just bragging)
>>
>> I'd love it if some folks in the know could take a look at my code
>> and see if it stirs up any ideas on integration or just makes you
>> want to jump off a bridge.
>>
>
> It does sound very nice, thanks very much for the effort!
>
> I didn't look at the code, but one thing I noticed when trying the driver is that it requires a bunch of other
librarieswhich is highly unusual in the JDBC "world". Usually the JDBC driver jar is self contained. 
>
> When I tried it with my SQL tool (SQL Workbench/J) I got the following exception when connecting:
>
>
> java.lang.NullPointerException
>  at com.impossibl.postgres.protocol.v30.ProtocolImpl.sendStartup(ProtocolImpl.java:168)
>  at com.impossibl.postgres.protocol.v30.StartupCommandImpl.execute(StartupCommandImpl.java:107)
>  at com.impossibl.postgres.protocol.v30.ProtocolImpl.execute(ProtocolImpl.java:139)
>  at com.impossibl.postgres.system.BasicContext.start(BasicContext.java:151)
>  at com.impossibl.postgres.system.BasicContext.init(BasicContext.java:110)
>  at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:56)
>  at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:22)
>
> Regards
> Thomas
>
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc



Re: A new JDBC driver...

From
Kevin Wooten
Date:
Also, I fully intended that the final packaging is a single self contained jar.

I have not been concerned with it too much at this point but Maven makes it pretty easy.  Netty is easily embedded into
thejar and, while also easy to embed, the Apache BeansUtils may not be needed in the end. 

On Mar 12, 2013, at 10:39 AM, Thomas Kellerer <spam_eater@gmx.net> wrote:

> Kevin Wooten wrote on 12.03.2013 02:19:
>> So… I did a crazy thing… I created an entirely new JDBC driver.
>>
>> (For those who care there's an actual question at the bottom… this
>> isn't just bragging)
>>
>> I'd love it if some folks in the know could take a look at my code
>> and see if it stirs up any ideas on integration or just makes you
>> want to jump off a bridge.
>>
>
> It does sound very nice, thanks very much for the effort!
>
> I didn't look at the code, but one thing I noticed when trying the driver is that it requires a bunch of other
librarieswhich is highly unusual in the JDBC "world". Usually the JDBC driver jar is self contained. 
>
> When I tried it with my SQL tool (SQL Workbench/J) I got the following exception when connecting:
>
>
> java.lang.NullPointerException
>  at com.impossibl.postgres.protocol.v30.ProtocolImpl.sendStartup(ProtocolImpl.java:168)
>  at com.impossibl.postgres.protocol.v30.StartupCommandImpl.execute(StartupCommandImpl.java:107)
>  at com.impossibl.postgres.protocol.v30.ProtocolImpl.execute(ProtocolImpl.java:139)
>  at com.impossibl.postgres.system.BasicContext.start(BasicContext.java:151)
>  at com.impossibl.postgres.system.BasicContext.init(BasicContext.java:110)
>  at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:56)
>  at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:22)
>
> Regards
> Thomas
>
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc



Re: A new JDBC driver...

From
dmp
Date:
Kevin Wooten wrote on 12.03.2013 02:19:
>> So… I did a crazy thing… I created an entirely new JDBC driver.
>>
>> (For those who care there's an actual question at the bottom… this
>> isn't just bragging)
>>
>> I'd love it if some folks in the know could take a look at my code
>> and see if it stirs up any ideas on integration or just makes you
>> want to jump off a bridge.
>>
Thomas Kellerer wrote:
>
> It does sound very nice, thanks very much for the effort!
>
> I didn't look at the code, but one thing I noticed when trying the
> driver is that it requires a bunch of other libraries which is highly
> unusual in the JDBC "world". Usually the JDBC driver jar is self contained.
>
> When I tried it with my SQL tool (SQL Workbench/J) I got the following
> exception when connecting:
>
>
> java.lang.NullPointerException
> at
> com.impossibl.postgres.protocol.v30.ProtocolImpl.sendStartup(ProtocolImpl.java:168)
>
> at
> com.impossibl.postgres.protocol.v30.StartupCommandImpl.execute(StartupCommandImpl.java:107)
>
> at
> com.impossibl.postgres.protocol.v30.ProtocolImpl.execute(ProtocolImpl.java:139)
>
> at com.impossibl.postgres.system.BasicContext.start(BasicContext.java:151)
> at com.impossibl.postgres.system.BasicContext.init(BasicContext.java:110)
> at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:56)
> at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:22)
>
> Regards
> Thomas

Recently I integrated an additional Database, Apache Derby, into my database
access tool. Derby required three different jars to be used for JDBC support.
I really don't like having to pile lots of libraries into my application in
order to get it to work.

If the trial that Thomas performed and my own looking at the code seems to
indicate the code is incomplete and non-functional. Perhaps it could become
a replacement for the current JDBC, but that may not be for some time.
PostgreSQL is becoming mainstream and I would like to see additional
support to keep the pgJDBC up to date, but branching seems unproductive to
keeping a stable Driver in place. Integration seems more appropriate.

danap.


Re: A new JDBC driver...

From
Jukka Ruohonen
Date:
On Tue, Mar 12, 2013 at 08:11:55PM +0200, Mikko Tiihonen wrote:
> I think using 3rd party dependencies is the proper way to go forward
> instead of reimplementing all required functionalities in every JDBC driver.
> That is not to say that the required dependencies should be kept to minimum.

Two cents: it troubles me always when I hear this reasoning because the
external dependencies still increase the maintenance costs, no matter how
well managed. For instance, for JDBC/DB stuff, many people care (or are
mandated to care) about security and the associated maintenance.

- Jukka.


Re: A new JDBC driver...

From
Bryan Varner
Date:
>> I think using 3rd party dependencies is the proper way to go forward
>> instead of reimplementing all required functionalities in every JDBC driver.
>> That is not to say that the required dependencies should be kept to minimum.
>
> Two cents: it troubles me always when I hear this reasoning because the
> external dependencies still increase the maintenance costs, no matter how
> well managed. For instance, for JDBC/DB stuff, many people care (or are
> mandated to care) about security and the associated maintenance.

I could not agree more.


Re: A new JDBC driver...

From
Kevin Wooten
Date:
Just to put this to bed, with regards to the project under discussion, the final delivery will be a single jar file
withno external dependencies.  Just haven't gotten that far yet. 

On Mar 12, 2013, at 11:52 AM, Bryan Varner <bvarner@polarislabs.com> wrote:

>>> I think using 3rd party dependencies is the proper way to go forward
>>> instead of reimplementing all required functionalities in every JDBC driver.
>>> That is not to say that the required dependencies should be kept to minimum.
>>
>> Two cents: it troubles me always when I hear this reasoning because the
>> external dependencies still increase the maintenance costs, no matter how
>> well managed. For instance, for JDBC/DB stuff, many people care (or are
>> mandated to care) about security and the associated maintenance.
>
> I could not agree more.
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc



Re: A new JDBC driver...

From
Dave Cramer
Date:
Yes, but it relies on extra jars who's pedigree we don't control. Which is part of the security issue.

As I said earlier I like this idea as a high performant minimal functioning driver. By minimal functioning I'd probably not implement all of the metadata API(s) 





Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Tue, Mar 12, 2013 at 2:55 PM, Kevin Wooten <kdubb@me.com> wrote:
Just to put this to bed, with regards to the project under discussion, the final delivery will be a single jar file with no external dependencies.  Just haven't gotten that far yet.

On Mar 12, 2013, at 11:52 AM, Bryan Varner <bvarner@polarislabs.com> wrote:

>>> I think using 3rd party dependencies is the proper way to go forward
>>> instead of reimplementing all required functionalities in every JDBC driver.
>>> That is not to say that the required dependencies should be kept to minimum.
>>
>> Two cents: it troubles me always when I hear this reasoning because the
>> external dependencies still increase the maintenance costs, no matter how
>> well managed. For instance, for JDBC/DB stuff, many people care (or are
>> mandated to care) about security and the associated maintenance.
>
> I could not agree more.
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc



--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc

Re: A new JDBC driver...

From
Thomas Kellerer
Date:
Kevin Wooten wrote on 12.03.2013 19:29:
> I don't expect it to work with SQL workbench but after looking at
> this, it cannot find the property for "username", "password" or
> "database".
>
> "database" is properly gleaned from the URL. "username" and
> "password" can be provided as extra properties or as URL query
> parameters.
>
> What URL are you using to connect and what properties are being
> provided?
>

Exactly the same credentials as with the "real" Postgres driver.

The URL is: jdbc:postgresql://localhost/wbtest

And I am using Driver.connect(String, Properties)

The Properties instance contains essentially the keys "user" and "password" (as per JDBC specs).

Regards
Thomas

Re: A new JDBC driver...

From
Thor Michael Støre
Date:
Hello,

From: Kevin Wooten <kdubb@me.com>
Subject: A new JDBC driver...
Date: 12. mars 2013 02:19:11 CET


About 10 days ago I set out to fix the current driver's support for composite and array objects.  Currently they are only ever retrieved as string encoding due to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare.  Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol (save for Copy In/Out).

Hm, nightmare might be a stretch, though I see your point. I recently wrote a parser for text representations of arrays and composites of the basic types for my own personal little pet project, which is a kind of a MyBatis-like thing but highly Postgres specific. It wasn't much work, but if you're going to support "everything" and doing it in the JDBC layer I guess going the binary route would be the better bet.

The part of my implementation that just deals with parsing the text format isn't much to gawk at, but just for the record I've got it here:

And yes, I'm sure it chokes on lots of things.

* Can decode any recognized type to a Java object (this includes any imaginable composite or array type)
* Connection.setTypeMap and ResultSet.get(int idx, Map) are both fully supported
* Requests for composite objects that have no custom mapping are returned as HashMap
* Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])
* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)

Right, that's the sort of thing I've been working at too, but since there wasn't any such support in the PgJDBC driver I went with getting the text representation with getString and parsing it.

* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
* LISTEN/NOTIFY and notifications can come through asynchronously
* Netty has a great system for managing buffers and reading/writing messages that shows increased speed
* Performance wasn't a goal of this project but it's a nice side effect

That sounds great, when I poked around the driver trying to figure out how to retrieve arrays of complex types in a non-ineffective manner I did wonder about performance if they got very large.

BINARY SUPPORT
As outlined above I started the project to support decoding of Composite and Array types.  To accomplish this I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection.  I then create a type registry that holds all the required details of all the types.  Procedures, for both TEXT and BINARY protocols, are looked up and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers".  When a DataRow message is received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row.  When sending parameter data the type is located and the it's encoder is called to encode the data.

Right, you do all that at work upon initialization at runtime, which would be the way to go to have it all done generically in the driver and accessible via the JDBC methods. Before settling down and writing a parser for the text format I quickly looked inside PgJDBC to see what it would take to have it do something like this, but I realized it would be a lot of work long before I saw a solution like the way you've implemented it. In the end for my own project I went with doing all that part of the work outside of the JDBC driver by generating a static factory method for each type that the application in question actually uses, which fits the "zero reflection and full compile-time metaprogramming" approach my project has.

Should I continue on this path of a new driver and see if people join or should I take what I have learned and try to refit it into the current code?

While I can't comment on this directly I would like to say that I for one would appreciate either of those things, just as long as this doesn't fizzle out I'd be quite content. At the moment it's a non-starter for me to try out on my pet project because it heavily relies on the JDBC metadata functions (which I'm not expecting will come along any time soon either, don't worry), but it's very good to hear that someone is putting effort into areas of Postgres and JDBC that would improve these areas I've been looking at even when that effort may take a while to bear fruits. I'm not sure how to help either, seeing as I don't have much to test it against, and seem to only have vague ideas in areas you seem to have solutions.

Thanks for the effort either way,
Thor Michael

Re: A new JDBC driver...

From
Dave Cramer
Date:
Kevin,

While I haven't looked at your driver in detail; it occurs to me that since the FE/BE protocol is broken out in the current driver, would it be possible to use your protocol implementation in the current driver ?

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Tue, Mar 12, 2013 at 6:41 PM, Thor Michael Støre <thormichael@gmail.com> wrote:
Hello,

From: Kevin Wooten <kdubb@me.com>
Subject: A new JDBC driver...
Date: 12. mars 2013 02:19:11 CET


About 10 days ago I set out to fix the current driver's support for composite and array objects.  Currently they are only ever retrieved as string encoding due to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare.  Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol (save for Copy In/Out).

Hm, nightmare might be a stretch, though I see your point. I recently wrote a parser for text representations of arrays and composites of the basic types for my own personal little pet project, which is a kind of a MyBatis-like thing but highly Postgres specific. It wasn't much work, but if you're going to support "everything" and doing it in the JDBC layer I guess going the binary route would be the better bet.

The part of my implementation that just deals with parsing the text format isn't much to gawk at, but just for the record I've got it here:

And yes, I'm sure it chokes on lots of things.

* Can decode any recognized type to a Java object (this includes any imaginable composite or array type)
* Connection.setTypeMap and ResultSet.get(int idx, Map) are both fully supported
* Requests for composite objects that have no custom mapping are returned as HashMap
* Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])
* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)

Right, that's the sort of thing I've been working at too, but since there wasn't any such support in the PgJDBC driver I went with getting the text representation with getString and parsing it.

* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
* LISTEN/NOTIFY and notifications can come through asynchronously
* Netty has a great system for managing buffers and reading/writing messages that shows increased speed
* Performance wasn't a goal of this project but it's a nice side effect

That sounds great, when I poked around the driver trying to figure out how to retrieve arrays of complex types in a non-ineffective manner I did wonder about performance if they got very large.

BINARY SUPPORT
As outlined above I started the project to support decoding of Composite and Array types.  To accomplish this I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection.  I then create a type registry that holds all the required details of all the types.  Procedures, for both TEXT and BINARY protocols, are looked up and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers".  When a DataRow message is received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row.  When sending parameter data the type is located and the it's encoder is called to encode the data.

Right, you do all that at work upon initialization at runtime, which would be the way to go to have it all done generically in the driver and accessible via the JDBC methods. Before settling down and writing a parser for the text format I quickly looked inside PgJDBC to see what it would take to have it do something like this, but I realized it would be a lot of work long before I saw a solution like the way you've implemented it. In the end for my own project I went with doing all that part of the work outside of the JDBC driver by generating a static factory method for each type that the application in question actually uses, which fits the "zero reflection and full compile-time metaprogramming" approach my project has.

Should I continue on this path of a new driver and see if people join or should I take what I have learned and try to refit it into the current code?

While I can't comment on this directly I would like to say that I for one would appreciate either of those things, just as long as this doesn't fizzle out I'd be quite content. At the moment it's a non-starter for me to try out on my pet project because it heavily relies on the JDBC metadata functions (which I'm not expecting will come along any time soon either, don't worry), but it's very good to hear that someone is putting effort into areas of Postgres and JDBC that would improve these areas I've been looking at even when that effort may take a while to bear fruits. I'm not sure how to help either, seeing as I don't have much to test it against, and seem to only have vague ideas in areas you seem to have solutions.

Thanks for the effort either way,
Thor Michael

Re: A new JDBC driver...

From
Kevin Wooten
Date:
It could probably be done if it weren't for my implementation delivering Java objects.  The current driver expects nothing but bytes arrays for each column of every row of the result set.  So for a Postgres bool it has an byte[1] whereas my implementation stores a boolean.

The "DataRow" message handler could, fairly easily, be made to stop parsing it to the extent that it does and just return a byte[][] as the current driver does.  To support my goals though, all of the ad-hoc parsing that's done in AbstractJdbc2ResultSet, and friends, would still need to be replaced with the binary parsing I am doing with my type framework.

On Mar 12, 2013, at 3:49 PM, Dave Cramer <pg@fastcrypt.com> wrote:

Kevin,

While I haven't looked at your driver in detail; it occurs to me that since the FE/BE protocol is broken out in the current driver, would it be possible to use your protocol implementation in the current driver ?

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Tue, Mar 12, 2013 at 6:41 PM, Thor Michael Støre <thormichael@gmail.com> wrote:
Hello,

From: Kevin Wooten <kdubb@me.com>
Subject: A new JDBC driver...
Date: 12. mars 2013 02:19:11 CET


About 10 days ago I set out to fix the current driver's support for composite and array objects.  Currently they are only ever retrieved as string encoding due to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare.  Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol (save for Copy In/Out).

Hm, nightmare might be a stretch, though I see your point. I recently wrote a parser for text representations of arrays and composites of the basic types for my own personal little pet project, which is a kind of a MyBatis-like thing but highly Postgres specific. It wasn't much work, but if you're going to support "everything" and doing it in the JDBC layer I guess going the binary route would be the better bet.

The part of my implementation that just deals with parsing the text format isn't much to gawk at, but just for the record I've got it here:

And yes, I'm sure it chokes on lots of things.

* Can decode any recognized type to a Java object (this includes any imaginable composite or array type)
* Connection.setTypeMap and ResultSet.get(int idx, Map) are both fully supported
* Requests for composite objects that have no custom mapping are returned as HashMap
* Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])
* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)

Right, that's the sort of thing I've been working at too, but since there wasn't any such support in the PgJDBC driver I went with getting the text representation with getString and parsing it.

* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
* LISTEN/NOTIFY and notifications can come through asynchronously
* Netty has a great system for managing buffers and reading/writing messages that shows increased speed
* Performance wasn't a goal of this project but it's a nice side effect

That sounds great, when I poked around the driver trying to figure out how to retrieve arrays of complex types in a non-ineffective manner I did wonder about performance if they got very large.

BINARY SUPPORT
As outlined above I started the project to support decoding of Composite and Array types.  To accomplish this I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection.  I then create a type registry that holds all the required details of all the types.  Procedures, for both TEXT and BINARY protocols, are looked up and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers".  When a DataRow message is received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row.  When sending parameter data the type is located and the it's encoder is called to encode the data.

Right, you do all that at work upon initialization at runtime, which would be the way to go to have it all done generically in the driver and accessible via the JDBC methods. Before settling down and writing a parser for the text format I quickly looked inside PgJDBC to see what it would take to have it do something like this, but I realized it would be a lot of work long before I saw a solution like the way you've implemented it. In the end for my own project I went with doing all that part of the work outside of the JDBC driver by generating a static factory method for each type that the application in question actually uses, which fits the "zero reflection and full compile-time metaprogramming" approach my project has.

Should I continue on this path of a new driver and see if people join or should I take what I have learned and try to refit it into the current code?

While I can't comment on this directly I would like to say that I for one would appreciate either of those things, just as long as this doesn't fizzle out I'd be quite content. At the moment it's a non-starter for me to try out on my pet project because it heavily relies on the JDBC metadata functions (which I'm not expecting will come along any time soon either, don't worry), but it's very good to hear that someone is putting effort into areas of Postgres and JDBC that would improve these areas I've been looking at even when that effort may take a while to bear fruits. I'm not sure how to help either, seeing as I don't have much to test it against, and seem to only have vague ideas in areas you seem to have solutions.

Thanks for the effort either way,
Thor Michael


Re: A new JDBC driver...

From
Andreas Joseph Krogh
Date:
<div>På tirsdag 12. mars 2013 kl. 02:19:11, skrev Kevin Wooten <<a href="mailto:kdubb@me.com"
target="_blank">kdubb@me.com</a>>:</div><blockquotestyle="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt
0pt0.8ex; padding-left: 1ex;">So… I did a crazy thing… I created an entirely new JDBC driver.<br /><br /> (For those
whocare there's an actual question at the bottom… this isn't just bragging)<br /><br /> About 10 days ago I set out to
fixthe current driver's support for composite and array objects.  Currently they are only ever retrieved as string
encodingdue to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a
nightmare. Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE
protocol(save for Copy In/Out).<br /><br /> I began by hacking the driver to force it into binary mode and pulling the
rawdata with getBytes. Then I started building a framework for recognizing, encoding and decoding all the possible
typesPostgreSQL knows about.   Once that was working well, my mission was to rework this into the current driver.  This
provedalmost impossible due to 1) my limited familiarity with the code and 2) the assumptions it makes about the
formatsof things like parameters.  In the end it seemed too time consuming for ME to do this.  So, partly for fun, I
decidedto just implement the FE/BE protocol and see where it got me. Next thing I knew I was running queries and
retrievingdata.  Basically it's just a side project, of a retro-fit, that went wrong and has spiraled out of control
;)<br/><br /> Currently my "driver" (I use the term loosely as you can imagine it's state of compliance about 6 days of
realwork) looks like this:<br /><br /> * Basic support for JDBC interfaces (Driver, Connection, PreparedStatement,
ResultSet)<br/> * Can be discovered and loaded automatically through JDBC4 loader<br /> * Supports ONLY JDBC4
interfaces(no attempt to compile to only JDBC2/3 is made)<br /> * Lots of stuff missing, non-conformant or just plain
broken<br/> * Speaks "BINARY" format almost exclusively<br /> * Extremely clean and simple interface for supporting new
types;VERY DRY (only added in 1 place ever)<br /> * Unsupported formats can fallback to "TEXT"<br /> * Almost all of
thecurrent standard types are supported (even the hard ones like numerics, timestamps, dates)<br /> * Can decode any
recognizedtype to a Java object (this includes any imaginable composite or array type)<br /> * Connection.setTypeMap
andResultSet.get(int idx, Map) are both fully supported<br /> * Requests for composite objects that have no custom
mappingare returned as HashMap<br /> * Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])<br
/>* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)<br /> * Asynchronous
I/Oengine provided by Netty<br /> * All connections share a single group of worker threads<br /> * LISTEN/NOTIFY and
notificationscan come through asynchronously<br /> * Netty has a great system for managing buffers and reading/writing
messagesthat shows increased speed<br /> * Performance wasn't a goal of this project but it's a nice side effect<br />
*Maven project<br /> * Written against Java 7<br /> * Obviously to enter any sort of real use the source version will
haveto be dialed down<br /> * Shows some serious gains in performance<br /> * Query and retrieve from the ResultSet a
millionrows with timestamps and it's about a 4-5x increase in speed<br /> * Obviously some types there is no speed
difference<br/> * Again I say, performance was NOT a goal of this project it's just a nice side effect<br /><br />
BINARYSUPPORT<br /> As outlined above I started the project to support decoding of Composite and Array types.  To
accomplishthis I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection.  I then create a type
registrythat holds all the required details of all the types.  Procedures, for both TEXT and BINARY protocols, are
lookedup and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers".  When a DataRow
messageis received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the
row. When sending parameter data the type is located and the it's encoder is called to encode the data. Reading through
thedriver mailing-lists, it seems using binary only has some ramifications as far as type coercion and such are
concerned;currently all user initiated queries use the Extended Protocol & Statement Describe to ensure parameter
types/valuesare correct.<br /><br /> Where to go from here…<br /><br /> The major question I would like to ask is… <br
/><br/> Should I continue on this path of a new driver and see if people join or should I take what I have learned and
tryto refit it into the current code?<br /><br /> I am no dummy.  I understand the years of experience the current
driverhas to ensure it works well in an extremely large number of cases.  At the same time, anybody who has peeked
aroundin there (and I have done quite a bit of it) knows its showing its age.   My driver is 100% new code… not a
stitchof the old was used.  Give this, it seems like transplanting my new "core" into the current project would be like
givingit a brain transplant just after a fresh head transplant; in other words… a rewrite.<br /><br /> I'd love it if
somefolks in the know could take a look at my code and see if it stirs up any ideas on integration or just makes you
wantto jump off a bridge.<br /><br /> If you read this far you get a cookie…<br /><br /> Here is the GitHub project…
 <ahref="https://github.com/kdubb/pgjdbc-ng" target="_blank">https://github.com/kdubb/pgjdbc-ng</a><br /><br />
Kevin</blockquote><div> </div><div>Coolio!!Can't wait to see what happens next. It's often fresh, ambitious, work like
thiswhich brings the world forward.</div><div> </div><div class="origo-email-signature">--<br /> Andreas Joseph Krogh
<andreak@officenet.no>     mob: +47 909 56 963<br /> Senior Software Developer / CTO - OfficeNet AS -
http://www.officenet.no<br/> Public key: http://home.officenet.no/~andreak/public_key.asc</div><div> </div> 

Re: A new JDBC driver...

From
Craig Ringer
Date:
On 03/12/2013 09:19 AM, Kevin Wooten wrote:

* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
That's going to cause serious issues on Java EE, especially when unloading applications. Since you're on JDBC4 you can have init and cleanup functions to manage the thread pool, but these are vital. I asked about this re PgJDBC quite some time ago and got some good information that's summarized here:

http://stackoverflow.com/q/8514725/398670

Failure to properly shut the thread pool down when a servlet or application is unloaded will cause classloader leaks, tending to lead to PermGenSpace exhaustion errors and other problems. The driver will probably need application-server-specific integration hooks too.

As for the "multiple JARs" converns, it's trivial to bundle dependencies inside the JDBC driver jar. However, this can cause issues if there's an incompatible version of the same library elsewhere on the classpath. It's OK if you're on a modern application server like JBoss AS 7 that isolates classloader chains, but it can be a real problem on older platforms and standalone applications. For this reason both a rollup version of the jar and a version without bundled libraries would probably be needed, but this is trivial to produce from Maven.

Overall I think this is an intriguing idea, whether it proves to be an ideas testbed or something that becomes appealing to adopt more seriously. Congratulations on tackling it.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: A new JDBC driver...

From
Adib Saikali
Date:
I agree on the importance of avoiding conflicts with the class path for the application server.  I think jarjar can easily solve such a problem by embedding the required third party libraries in the final jar file in such a way that conflicts with other versions of the same library does not arise https://code.google.com/p/jarjar/ .  The spring framework uses jar jar to package internal versions of ASM and CGLIB, the org.springframework. package hierarchy 


 
On 2013-03-14, at 12:56 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 03/12/2013 09:19 AM, Kevin Wooten wrote:

* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
That's going to cause serious issues on Java EE, especially when unloading applications. Since you're on JDBC4 you can have init and cleanup functions to manage the thread pool, but these are vital. I asked about this re PgJDBC quite some time ago and got some good information that's summarized here:

http://stackoverflow.com/q/8514725/398670

Failure to properly shut the thread pool down when a servlet or application is unloaded will cause classloader leaks, tending to lead to PermGenSpace exhaustion errors and other problems. The driver will probably need application-server-specific integration hooks too.

As for the "multiple JARs" converns, it's trivial to bundle dependencies inside the JDBC driver jar. However, this can cause issues if there's an incompatible version of the same library elsewhere on the classpath. It's OK if you're on a modern application server like JBoss AS 7 that isolates classloader chains, but it can be a real problem on older platforms and standalone applications. For this reason both a rollup version of the jar and a version without bundled libraries would probably be needed, but this is trivial to produce from Maven.

Overall I think this is an intriguing idea, whether it proves to be an ideas testbed or something that becomes appealing to adopt more seriously. Congratulations on tackling it.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: A new JDBC driver...

From
Florent Guillaume
Date:
On Thu, Mar 14, 2013 at 5:56 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 03/12/2013 09:19 AM, Kevin Wooten wrote:
> * Asynchronous I/O engine provided by Netty
> * All connections share a single group of worker threads
>
> That's going to cause serious issues on Java EE, especially when unloading
> applications. Since you're on JDBC4 you can have init and cleanup functions
> to manage the thread pool, but these are vital. I asked about this re PgJDBC
> quite some time ago and got some good information that's summarized here:
>
> http://stackoverflow.com/q/8514725/398670
>
> Failure to properly shut the thread pool down when a servlet or application
> is unloaded will cause classloader leaks, tending to lead to PermGenSpace
> exhaustion errors and other problems. The driver will probably need
> application-server-specific integration hooks too.
>
> As for the "multiple JARs" converns, it's trivial to bundle dependencies
> inside the JDBC driver jar. However, this can cause issues if there's an
> incompatible version of the same library elsewhere on the classpath. It's OK
> if you're on a modern application server like JBoss AS 7 that isolates
> classloader chains, but it can be a real problem on older platforms and
> standalone applications. For this reason both a rollup version of the jar
> and a version without bundled libraries would probably be needed, but this
> is trivial to produce from Maven.

maven-shade-plugin was already mentioned to solve the class conflict
issues (similar to jarjar).

Florent

>
> Overall I think this is an intriguing idea, whether it proves to be an ideas
> testbed or something that becomes appealing to adopt more seriously.
> Congratulations on tackling it.
>
> --
>  Craig Ringer                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services



--
Florent Guillaume, Director of R&D, Nuxeo
Open Source, Java EE based, Enterprise Content Management (ECM)
http://www.nuxeo.com   http://www.nuxeo.org   +33 1 40 33 79 87


Re: A new JDBC driver...

From
Kevin Wooten
Date:
Craig, thanks a lot for the information.  I read you SO question and related info, and then did a bit of researching on my own.  I have come up with a few things…

Starting your own threads is not suggested because of the problems with not have a definitive startup/shutdown (until more recent JavaEE versions) and any thread you start cannot use any of the services of the container.  Everything else I have read just says "IF" you don't shut the threads down properly you could wreak havoc,  eat up resources, etc, etc.

The first issue isn't an issue at all.  The threads handle the I/O only and all data is delivered back to an application level thread for processing. Basically the fact that threads are used is completely transparent to the application code; it treats the calls synchronously.  Theoretically, because you can in C, you should be able to do Async I/O without threads but there seems to be some belief that, in Java, threads are the way to go; all the popular libraries I've seen are using thread pools.

Secondly, I have actually paid a bit of attention to the issue of threads shutting down because any abandoned connection causes the threads to remain active and the program to, at the very least, not be able to shutdown.  I used a reference counting system to share the thread pool between connections.  This guarantees that if the connections are properly closed, the connections will be killed.  I did a bit of experimentation last night with using weak references everywhere and trying to handle the case where a person forgets to close a connection. After a bit of messing around I finally settled on the much maligned "finalizer" to kill the connection if it's abandoned.  Connections are pretty heavyweight things and I don't believe finalizers are going to cause any issue with performance. Most applications will not be creating & destroying thousands of connections such that it creates a performance problem for the GC because of finalizers.  Any good app, and especially JavaEE containers, will be pooling them and closing them properly anyway, which will in turn shut the threads down properly.

I did manage to hack into my driver an implementation of setQueryTimeout.  The disconnected nature made this quite simple as my current indefinite wait for query complete just became a wait with timeout.  The real issue I struggled with, so I put it on hold until later, was what to do once the timeout happens.  I believe you would need to issue a "Cancel" to the server and then send a "Sync" message.  PostgreSQL's cancel system is pretty interesting. You have to open a new socket and send a cancel packet.  This works great unless the reason for the timeout was a network issue...  then you need to handle the connection timeout, send timeout, etc. of the cancel procedure and finally you still have to wait for the servers Error response from your cancel!   So it gets pretty darn interesting but I should be able to make it happen properly in the new driver.

Finally, with regard to your SO question since there seems to be no answer, you could try, as I touched on earlier, and implement the query timeout by using non-blocking sockets, selectors and the like.  I think you'll quickly grow to appreciate why others are using threads; Java has made something that was easy in C, very hard.  Also, in my journeys last night I discovered the statement "statement_timeout" connection parameter.  If you didn't know about it already, the server will cancel any statement that takes longer than this value.  It may be an easy solution to your problem.

Kevin

On Mar 13, 2013, at 9:56 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 03/12/2013 09:19 AM, Kevin Wooten wrote:

* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
That's going to cause serious issues on Java EE, especially when unloading applications. Since you're on JDBC4 you can have init and cleanup functions to manage the thread pool, but these are vital. I asked about this re PgJDBC quite some time ago and got some good information that's summarized here:

http://stackoverflow.com/q/8514725/398670

Failure to properly shut the thread pool down when a servlet or application is unloaded will cause classloader leaks, tending to lead to PermGenSpace exhaustion errors and other problems. The driver will probably need application-server-specific integration hooks too.

As for the "multiple JARs" converns, it's trivial to bundle dependencies inside the JDBC driver jar. However, this can cause issues if there's an incompatible version of the same library elsewhere on the classpath. It's OK if you're on a modern application server like JBoss AS 7 that isolates classloader chains, but it can be a real problem on older platforms and standalone applications. For this reason both a rollup version of the jar and a version without bundled libraries would probably be needed, but this is trivial to produce from Maven.

Overall I think this is an intriguing idea, whether it proves to be an ideas testbed or something that becomes appealing to adopt more seriously. Congratulations on tackling it.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: A new JDBC driver...

From
Craig Ringer
Date:
On 03/15/2013 01:15 PM, Kevin Wooten wrote:
Craig, thanks a lot for the information.  I read you SO question and related info, and then did a bit of researching on my own.  I have come up with a few things…

Just to clarify, it seems you thought I was asking for help with that question. I wasn't, I just wanted to direct you to the informative answer given and raise your awareness of the concerns around threading in Java EE application servers in case you weren't aware of the problems there.

If you're unaware of classloader leaks and the related problems, start here:

http://frankkieviet.blogspot.com.au/2006/10/classloader-leaks-dreaded-permgen-space.html

Starting your own threads is not suggested because of the problems with not have a definitive startup/shutdown (until more recent JavaEE versions) and any thread you start cannot use any of the services of the container.  Everything else I have read just says "IF" you don't shut the threads down properly you could wreak havoc,  eat up resources, etc, etc.
Yes... and you need to implement ServiceLoader hooks in order for threads created by the JDBC driver to be properly terminated before the driver is unloaded.

Remember that in Java EE the JDBC driver may be unloaded without the app server being terminated. This mainly occurs in apps that bundle JDBC drivers in their .war instead of using the server connection pool, but is also seen when JDBC drivers are uninstalled or unloaded from a running server as is common in upgrades.

If the driver has its own threads and it doesn't terminate every thread in the pool the unload will fail and you'll get a classloader leak.


The first issue isn't an issue at all.  The threads handle the I/O only and all data is delivered back to an application level thread for processing. Basically the fact that threads are used is completely transparent to the application code; it treats the calls synchronously.
To the application code yes, to the app server, not necessarily, because the threads hold references to classes that hold references to the .war classloader in the case of in-app-bundled JDBC drivers. Unlike every other kind of Java object a thread is not GC'd if its refcount falls to zero as a thread is a GC root. So if a leaked connection is GC'd the associated threads may not be. You cannot rely on the finalize method to reliably or correctly clean up the threads as it may not run under far too many circumstances.

For that reason, it's important for the driver to keep track of its threads and ensure that it explicitly shuts down its thread pool when it is unloaded by the service loader.

Secondly, I have actually paid a bit of attention to the issue of threads shutting down because any abandoned connection causes the threads to remain active and the program to, at the very least, not be able to shutdown.
That's not the case if you're creating daemon threads (which you should be) - but it can cause other problems with object leaks. Even if you hold only weak references in your thread the thread its self can still cause a classloader leak since it holds a strong reference to its class object (or in the case of an unsubclassed Thread, to the runnable and via it to the runnable's class).

I used a reference counting system to share the thread pool between connections.  This guarantees that if the connections are properly closed, the threads will be killed.  I did a bit of experimentation last night with using weak references everywhere and trying to handle the case where a person forgets to close a connection.
Good idea. The driver *must* continue to function when connections aren't closed, as this is unfortunately extremely common. Relying on finalizers won't cut it in my opinion, they're just too unreliable to use for anything except logging warnings to say you forgot to close something.

The main issues with finalizers isn't performance, it is the number of circumstances under which they just don't run, run multiple times, or otherwise have undesirable characteristics.

PostgreSQL's cancel system is pretty interesting. You have to open a new socket and send a cancel packet.
Yeah, I find this to be particularly awful myself. I vaguely recall seeing some discussion suggesting that in-band cancel was possible, but I'll have to delve through my archives to find out where.
Finally, with regard to your SO question since there seems to be no answer, you could try, as I touched on earlier, and implement the query timeout by using non-blocking sockets, selectors and the like.  I think you'll quickly grow to appreciate why others are using threads; Java has made something that was easy in C, very hard.  Also, in my journeys last night I discovered the statement "statement_timeout" connection parameter.  If you didn't know about it already, the server will cancel any statement that takes longer than this value.  It may be an easy solution to your problem.
Heh, I'm well aware of statement_timeout. In that question I was asking for guidance about sane ways to use threading within the JDBC driver as I was hoping to have time to implement threaded I/O within PgJDBC to finally solve the problem with the driver deadlocking when it's blocked on a read and the server won't reply until it gets another write. As you've alluded to, async I/O wasn't really an option especially since it plays very poorly with SSL sockets. The answer referring to the service loader was very helpful and that's what I was pointing you towards.


-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: A new JDBC driver...

From
Craig Ringer
Date:
On 03/15/2013 01:15 PM, Kevin Wooten wrote:
After a bit of messing around I finally settled on the much maligned "finalizer" to kill the connection if it's abandoned. 
Oh, and re finalizers:

https://www.securecoding.cert.org/confluence/display/java/MET12-J.+Do+not+use+finalizers

In particular, these points:

  • There is no fixed time at which finalizers must be executed because this depends on the JVM. The only guarantee is that any finalizer method that executes will do so sometime after the associated object has become unreachable (detected during the first cycle of garbage collection) and sometime before the garbage collector reclaims the associated object's storage (during the garbage collector's second cycle). Execution of an object's finalizer may be delayed for an arbitrarily long time after the object becomes unreachable. Consequently, invoking time-critical functionality such as closing file handles in an object's finalize() method is problematic.
  • The JVM may terminate without invoking the finalizer on some or all unreachable objects. Consequently, attempts to update critical persistent state from finalizer methods can fail without warning. Similarly, Java lacks any guarantee that finalizers will execute on process termination. Methods such as System.gc(), System.runFinalization(), System.runFinalizersOnExit(), and Runtime.runFinalizersOnExit() either lack such guarantees or have been deprecated because of lack of safety and potential for deadlock.

  • According to the Java Language Specification, §12.6.2, "Finalizer Invocations are Not Ordered" [JLS 2005]:

    The Java programming language imposes no ordering on finalize() method calls. Finalizers [of different objects] may be called in any order, or even concurrently.

    One consequence is that slow-running finalizers can delay execution of other finalizers in the queue. Further, the lack of guaranteed ordering can lead to substantial difficulty in maintaining desired program invariants.
    • Use of finalizers can introduce synchronization issues even when the remainder of the program is single-threaded. The finalize() methods are invoked by the garbage collector from one or more threads of its choice; these threads are typically distinct from the main() thread, although this property is not guaranteed. When a finalizer is necessary, any required cleanup data structures must be protected from concurrent access. See the JavaOne presentation by Hans J. Boehm [Boehm 2005] for additional information.
      In particular I would be concerned that a thread refcounting system would be subject to ordering problems or deadlocks introduced by attempts to introduce fixes for ordering problems.

      I really need to look into how other JDBC drivers handle these issues; it's not like this is a new problem.
      -- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

      Re: A new JDBC driver...

      From
      Kevin Wooten
      Date:
      I understood you weren't looking for an answer. I figured I'd share the knowledge I had... just in case ;)

      I really appreciate the information you are providing as you seem to be well versed in the issues.  I do have one concern though.  I feel like we are discussing the case where connections are not closed correctly.  Are you saying that even if the connections are closed properly that the threads can still leak resources?  I can't see how that would happen as properly closing connections will ensure the threads are shutdown manually.  Shutting down the threads manually will ensure their resources are released.  If we are talking about the error condition where connections are not closed I am not so concerned.

      On Mar 15, 2013, at 12:10 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

      On 03/15/2013 01:15 PM, Kevin Wooten wrote:
      Craig, thanks a lot for the information.  I read you SO question and related info, and then did a bit of researching on my own.  I have come up with a few things…

      Just to clarify, it seems you thought I was asking for help with that question. I wasn't, I just wanted to direct you to the informative answer given and raise your awareness of the concerns around threading in Java EE application servers in case you weren't aware of the problems there.

      If you're unaware of classloader leaks and the related problems, start here:

      http://frankkieviet.blogspot.com.au/2006/10/classloader-leaks-dreaded-permgen-space.html

      Starting your own threads is not suggested because of the problems with not have a definitive startup/shutdown (until more recent JavaEE versions) and any thread you start cannot use any of the services of the container.  Everything else I have read just says "IF" you don't shut the threads down properly you could wreak havoc,  eat up resources, etc, etc.
      Yes... and you need to implement ServiceLoader hooks in order for threads created by the JDBC driver to be properly terminated before the driver is unloaded.

      Remember that in Java EE the JDBC driver may be unloaded without the app server being terminated. This mainly occurs in apps that bundle JDBC drivers in their .war instead of using the server connection pool, but is also seen when JDBC drivers are uninstalled or unloaded from a running server as is common in upgrades.

      If the driver has its own threads and it doesn't terminate every thread in the pool the unload will fail and you'll get a classloader leak.


      The first issue isn't an issue at all.  The threads handle the I/O only and all data is delivered back to an application level thread for processing. Basically the fact that threads are used is completely transparent to the application code; it treats the calls synchronously.
      To the application code yes, to the app server, not necessarily, because the threads hold references to classes that hold references to the .war classloader in the case of in-app-bundled JDBC drivers. Unlike every other kind of Java object a thread is not GC'd if its refcount falls to zero as a thread is a GC root. So if a leaked connection is GC'd the associated threads may not be. You cannot rely on the finalize method to reliably or correctly clean up the threads as it may not run under far too many circumstances.

      For that reason, it's important for the driver to keep track of its threads and ensure that it explicitly shuts down its thread pool when it is unloaded by the service loader.

      Secondly, I have actually paid a bit of attention to the issue of threads shutting down because any abandoned connection causes the threads to remain active and the program to, at the very least, not be able to shutdown.
      That's not the case if you're creating daemon threads (which you should be) - but it can cause other problems with object leaks. Even if you hold only weak references in your thread the thread its self can still cause a classloader leak since it holds a strong reference to its class object (or in the case of an unsubclassed Thread, to the runnable and via it to the runnable's class).

      I used a reference counting system to share the thread pool between connections.  This guarantees that if the connections are properly closed, the threads will be killed.  I did a bit of experimentation last night with using weak references everywhere and trying to handle the case where a person forgets to close a connection.
      Good idea. The driver *must* continue to function when connections aren't closed, as this is unfortunately extremely common. Relying on finalizers won't cut it in my opinion, they're just too unreliable to use for anything except logging warnings to say you forgot to close something.

      The main issues with finalizers isn't performance, it is the number of circumstances under which they just don't run, run multiple times, or otherwise have undesirable characteristics.

      PostgreSQL's cancel system is pretty interesting. You have to open a new socket and send a cancel packet.
      Yeah, I find this to be particularly awful myself. I vaguely recall seeing some discussion suggesting that in-band cancel was possible, but I'll have to delve through my archives to find out where.
      Finally, with regard to your SO question since there seems to be no answer, you could try, as I touched on earlier, and implement the query timeout by using non-blocking sockets, selectors and the like.  I think you'll quickly grow to appreciate why others are using threads; Java has made something that was easy in C, very hard.  Also, in my journeys last night I discovered the statement "statement_timeout" connection parameter.  If you didn't know about it already, the server will cancel any statement that takes longer than this value.  It may be an easy solution to your problem.
      Heh, I'm well aware of statement_timeout. In that question I was asking for guidance about sane ways to use threading within the JDBC driver as I was hoping to have time to implement threaded I/O within PgJDBC to finally solve the problem with the driver deadlocking when it's blocked on a read and the server won't reply until it gets another write. As you've alluded to, async I/O wasn't really an option especially since it plays very poorly with SSL sockets. The answer referring to the service loader was very helpful and that's what I was pointing you towards.


      -- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

      Re: A new JDBC driver...

      From
      Kevin Wooten
      Date:
      Yes I am familiar with these issues.  Although, as I said previously, this is only for the case where connections, statements and result-sets are not being closed.  Finalizers, with all their issues, are a lot easier and serve the same purpose as tracking things with some combination of weak, soft or phantom references and trying to basically do a finalize.

      In it's final state I imagine the driver will be able to withstand leaked ResultSets and Statements but not Connections; you'll have to close your Connections properly.  To accomplish this I am using weak refs between the statements and connections and employing a finalize to catch the case where the statements and/or result sets were leaked.  If you, or anybody, has a better idea.  I am all ears.

      On Mar 15, 2013, at 12:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

      On 03/15/2013 01:15 PM, Kevin Wooten wrote:
      After a bit of messing around I finally settled on the much maligned "finalizer" to kill the connection if it's abandoned. 
      Oh, and re finalizers:

      https://www.securecoding.cert.org/confluence/display/java/MET12-J.+Do+not+use+finalizers

      In particular, these points:

      • There is no fixed time at which finalizers must be executed because this depends on the JVM. The only guarantee is that any finalizer method that executes will do so sometime after the associated object has become unreachable (detected during the first cycle of garbage collection) and sometime before the garbage collector reclaims the associated object's storage (during the garbage collector's second cycle). Execution of an object's finalizer may be delayed for an arbitrarily long time after the object becomes unreachable. Consequently, invoking time-critical functionality such as closing file handles in an object's finalize() method is problematic.
      • The JVM may terminate without invoking the finalizer on some or all unreachable objects. Consequently, attempts to update critical persistent state from finalizer methods can fail without warning. Similarly, Java lacks any guarantee that finalizers will execute on process termination. Methods such as System.gc(), System.runFinalization(), System.runFinalizersOnExit(), and Runtime.runFinalizersOnExit() either lack such guarantees or have been deprecated because of lack of safety and potential for deadlock.

      • According to the Java Language Specification, §12.6.2, "Finalizer Invocations are Not Ordered" [JLS 2005]:

        The Java programming language imposes no ordering on finalize() method calls. Finalizers [of different objects] may be called in any order, or even concurrently.

        One consequence is that slow-running finalizers can delay execution of other finalizers in the queue. Further, the lack of guaranteed ordering can lead to substantial difficulty in maintaining desired program invariants.
        • Use of finalizers can introduce synchronization issues even when the remainder of the program is single-threaded. The finalize() methods are invoked by the garbage collector from one or more threads of its choice; these threads are typically distinct from the main() thread, although this property is not guaranteed. When a finalizer is necessary, any required cleanup data structures must be protected from concurrent access. See the JavaOne presentation by Hans J. Boehm [Boehm 2005] for additional information.
          In particular I would be concerned that a thread refcounting system would be subject to ordering problems or deadlocks introduced by attempts to introduce fixes for ordering problems.

          I really need to look into how other JDBC drivers handle these issues; it's not like this is a new problem.
          -- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services