Thread: A new JDBC driver...
So… I did a crazy thing… I created an entirely new JDBC driver.
(For those who care there's an actual question at the bottom… this isn't just bragging)
About 10 days ago I set out to fix the current driver's support for composite and array objects. Currently they are only ever retrieved as string encoding due to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare. Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol (save for Copy In/Out).
I began by hacking the driver to force it into binary mode and pulling the raw data with getBytes. Then I started building a framework for recognizing, encoding and decoding all the possible types PostgreSQL knows about. Once that was working well, my mission was to rework this into the current driver. This proved almost impossible due to 1) my limited familiarity with the code and 2) the assumptions it makes about the formats of things like parameters. In the end it seemed too time consuming for ME to do this. So, partly for fun, I decided to just implement the FE/BE protocol and see where it got me. Next thing I knew I was running queries and retrieving data. Basically it's just a side project, of a retro-fit, that went wrong and has spiraled out of control ;)
Currently my "driver" (I use the term loosely as you can imagine it's state of compliance about 6 days of real work) looks like this:
* Basic support for JDBC interfaces (Driver, Connection, PreparedStatement, ResultSet)
* Can be discovered and loaded automatically through JDBC4 loader
* Supports ONLY JDBC4 interfaces (no attempt to compile to only JDBC2/3 is made)
* Lots of stuff missing, non-conformant or just plain broken
* Speaks "BINARY" format almost exclusively
* Extremely clean and simple interface for supporting new types; VERY DRY (only added in 1 place ever)
* Unsupported formats can fallback to "TEXT"
* Almost all of the current standard types are supported (even the hard ones like numerics, timestamps, dates)
* Can decode any recognized type to a Java object (this includes any imaginable composite or array type)
* Connection.setTypeMap and ResultSet.get(int idx, Map) are both fully supported
* Requests for composite objects that have no custom mapping are returned as HashMap
* Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])
* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)
* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
* LISTEN/NOTIFY and notifications can come through asynchronously
* Netty has a great system for managing buffers and reading/writing messages that shows increased speed
* Performance wasn't a goal of this project but it's a nice side effect
* Maven project
* Written against Java 7
* Obviously to enter any sort of real use the source version will have to be dialed down
* Shows some serious gains in performance
* Query and retrieve from the ResultSet a million rows with timestamps and it's about a 4-5x increase in speed
* Obviously some types there is no speed difference
* Again I say, performance was NOT a goal of this project it's just a nice side effect
BINARY SUPPORT
As outlined above I started the project to support decoding of Composite and Array types. To accomplish this I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection. I then create a type registry that holds all the required details of all the types. Procedures, for both TEXT and BINARY protocols, are looked up and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers". When a DataRow message is received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row. When sending parameter data the type is located and the it's encoder is called to encode the data. Reading through the driver mailing-lists, it seems using binary only has some ramifications as far as type coercion and such are concerned; currently all user initiated queries use the Extended Protocol & Statement Describe to ensure parameter types/values are correct.
Where to go from here…
The major question I would like to ask is…
Should I continue on this path of a new driver and see if people join or should I take what I have learned and try to refit it into the current code?
I am no dummy. I understand the years of experience the current driver has to ensure it works well in an extremely large number of cases. At the same time, anybody who has peeked around in there (and I have done quite a bit of it) knows its showing its age. My driver is 100% new code… not a stitch of the old was used. Give this, it seems like transplanting my new "core" into the current project would be like giving it a brain transplant just after a fresh head transplant; in other words… a rewrite.
I'd love it if some folks in the know could take a look at my code and see if it stirs up any ideas on integration or just makes you want to jump off a bridge.
If you read this far you get a cookie…
Here is the GitHub project… https://github.com/kdubb/pgjdbc-ng
Kevin
(For those who care there's an actual question at the bottom… this isn't just bragging)
About 10 days ago I set out to fix the current driver's support for composite and array objects. Currently they are only ever retrieved as string encoding due to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare. Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol (save for Copy In/Out).
I began by hacking the driver to force it into binary mode and pulling the raw data with getBytes. Then I started building a framework for recognizing, encoding and decoding all the possible types PostgreSQL knows about. Once that was working well, my mission was to rework this into the current driver. This proved almost impossible due to 1) my limited familiarity with the code and 2) the assumptions it makes about the formats of things like parameters. In the end it seemed too time consuming for ME to do this. So, partly for fun, I decided to just implement the FE/BE protocol and see where it got me. Next thing I knew I was running queries and retrieving data. Basically it's just a side project, of a retro-fit, that went wrong and has spiraled out of control ;)
Currently my "driver" (I use the term loosely as you can imagine it's state of compliance about 6 days of real work) looks like this:
* Basic support for JDBC interfaces (Driver, Connection, PreparedStatement, ResultSet)
* Can be discovered and loaded automatically through JDBC4 loader
* Supports ONLY JDBC4 interfaces (no attempt to compile to only JDBC2/3 is made)
* Lots of stuff missing, non-conformant or just plain broken
* Speaks "BINARY" format almost exclusively
* Extremely clean and simple interface for supporting new types; VERY DRY (only added in 1 place ever)
* Unsupported formats can fallback to "TEXT"
* Almost all of the current standard types are supported (even the hard ones like numerics, timestamps, dates)
* Can decode any recognized type to a Java object (this includes any imaginable composite or array type)
* Connection.setTypeMap and ResultSet.get(int idx, Map) are both fully supported
* Requests for composite objects that have no custom mapping are returned as HashMap
* Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])
* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)
* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
* LISTEN/NOTIFY and notifications can come through asynchronously
* Netty has a great system for managing buffers and reading/writing messages that shows increased speed
* Performance wasn't a goal of this project but it's a nice side effect
* Maven project
* Written against Java 7
* Obviously to enter any sort of real use the source version will have to be dialed down
* Shows some serious gains in performance
* Query and retrieve from the ResultSet a million rows with timestamps and it's about a 4-5x increase in speed
* Obviously some types there is no speed difference
* Again I say, performance was NOT a goal of this project it's just a nice side effect
BINARY SUPPORT
As outlined above I started the project to support decoding of Composite and Array types. To accomplish this I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection. I then create a type registry that holds all the required details of all the types. Procedures, for both TEXT and BINARY protocols, are looked up and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers". When a DataRow message is received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row. When sending parameter data the type is located and the it's encoder is called to encode the data. Reading through the driver mailing-lists, it seems using binary only has some ramifications as far as type coercion and such are concerned; currently all user initiated queries use the Extended Protocol & Statement Describe to ensure parameter types/values are correct.
Where to go from here…
The major question I would like to ask is…
Should I continue on this path of a new driver and see if people join or should I take what I have learned and try to refit it into the current code?
I am no dummy. I understand the years of experience the current driver has to ensure it works well in an extremely large number of cases. At the same time, anybody who has peeked around in there (and I have done quite a bit of it) knows its showing its age. My driver is 100% new code… not a stitch of the old was used. Give this, it seems like transplanting my new "core" into the current project would be like giving it a brain transplant just after a fresh head transplant; in other words… a rewrite.
I'd love it if some folks in the know could take a look at my code and see if it stirs up any ideas on integration or just makes you want to jump off a bridge.
If you read this far you get a cookie…
Here is the GitHub project… https://github.com/kdubb/pgjdbc-ng
Kevin
Kevin,
I think this is a very interesting idea. I've often thought that writing a stripped down version of the driver that did the basics *very* well would be an interesting project.
I'm wondering how to integrate this into the official project to live beside the current driver, and possibly supersede it ?
You mention "it is showing it's age" can you be more specific ?
Regards,
On Mon, Mar 11, 2013 at 9:19 PM, Kevin Wooten <kdubb@me.com> wrote:
So… I did a crazy thing… I created an entirely new JDBC driver.
(For those who care there's an actual question at the bottom… this isn't just bragging)
About 10 days ago I set out to fix the current driver's support for composite and array objects. Currently they are only ever retrieved as string encoding due to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare. Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol (save for Copy In/Out).
I began by hacking the driver to force it into binary mode and pulling the raw data with getBytes. Then I started building a framework for recognizing, encoding and decoding all the possible types PostgreSQL knows about. Once that was working well, my mission was to rework this into the current driver. This proved almost impossible due to 1) my limited familiarity with the code and 2) the assumptions it makes about the formats of things like parameters. In the end it seemed too time consuming for ME to do this. So, partly for fun, I decided to just implement the FE/BE protocol and see where it got me. Next thing I knew I was running queries and retrieving data. Basically it's just a side project, of a retro-fit, that went wrong and has spiraled out of control ;)
Currently my "driver" (I use the term loosely as you can imagine it's state of compliance about 6 days of real work) looks like this:
* Basic support for JDBC interfaces (Driver, Connection, PreparedStatement, ResultSet)
* Can be discovered and loaded automatically through JDBC4 loader
* Supports ONLY JDBC4 interfaces (no attempt to compile to only JDBC2/3 is made)
* Lots of stuff missing, non-conformant or just plain broken
* Speaks "BINARY" format almost exclusively
* Extremely clean and simple interface for supporting new types; VERY DRY (only added in 1 place ever)
* Unsupported formats can fallback to "TEXT"
* Almost all of the current standard types are supported (even the hard ones like numerics, timestamps, dates)
* Can decode any recognized type to a Java object (this includes any imaginable composite or array type)
* Connection.setTypeMap and ResultSet.get(int idx, Map) are both fully supported
* Requests for composite objects that have no custom mapping are returned as HashMap
* Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])
* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)
* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
* LISTEN/NOTIFY and notifications can come through asynchronously
* Netty has a great system for managing buffers and reading/writing messages that shows increased speed
* Performance wasn't a goal of this project but it's a nice side effect
* Maven project
* Written against Java 7
* Obviously to enter any sort of real use the source version will have to be dialed down
* Shows some serious gains in performance
* Query and retrieve from the ResultSet a million rows with timestamps and it's about a 4-5x increase in speed
* Obviously some types there is no speed difference
* Again I say, performance was NOT a goal of this project it's just a nice side effect
BINARY SUPPORT
As outlined above I started the project to support decoding of Composite and Array types. To accomplish this I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection. I then create a type registry that holds all the required details of all the types. Procedures, for both TEXT and BINARY protocols, are looked up and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers". When a DataRow message is received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row. When sending parameter data the type is located and the it's encoder is called to encode the data. Reading through the driver mailing-lists, it seems using binary only has some ramifications as far as type coercion and such are concerned; currently all user initiated queries use the Extended Protocol & Statement Describe to ensure parameter types/values are correct.
Where to go from here…
The major question I would like to ask is…
Should I continue on this path of a new driver and see if people join or should I take what I have learned and try to refit it into the current code?
I am no dummy. I understand the years of experience the current driver has to ensure it works well in an extremely large number of cases. At the same time, anybody who has peeked around in there (and I have done quite a bit of it) knows its showing its age. My driver is 100% new code… not a stitch of the old was used. Give this, it seems like transplanting my new "core" into the current project would be like giving it a brain transplant just after a fresh head transplant; in other words… a rewrite.
I'd love it if some folks in the know could take a look at my code and see if it stirs up any ideas on integration or just makes you want to jump off a bridge.
If you read this far you get a cookie…
Here is the GitHub project… https://github.com/kdubb/pgjdbc-ng
Kevin
Well done, I started a similar project a few years ago (but time constraints got the better of me) for the same reason: better support of arrays. I ended up modifying my own version of the driver, I don't think my patches were ever "approved"
Looking forward to giving it a go.
John
Looking forward to giving it a go.
John
On Mon, Mar 11, 2013 at 9:19 PM, Kevin Wooten <kdubb@me.com> wrote:
So… I did a crazy thing… I created an entirely new JDBC driver.
(For those who care there's an actual question at the bottom… this isn't just bragging)
About 10 days ago I set out to fix the current driver's support for composite and array objects. Currently they are only ever retrieved as string encoding due to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare. Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol (save for Copy In/Out).
I began by hacking the driver to force it into binary mode and pulling the raw data with getBytes. Then I started building a framework for recognizing, encoding and decoding all the possible types PostgreSQL knows about. Once that was working well, my mission was to rework this into the current driver. This proved almost impossible due to 1) my limited familiarity with the code and 2) the assumptions it makes about the formats of things like parameters. In the end it seemed too time consuming for ME to do this. So, partly for fun, I decided to just implement the FE/BE protocol and see where it got me. Next thing I knew I was running queries and retrieving data. Basically it's just a side project, of a retro-fit, that went wrong and has spiraled out of control ;)
Currently my "driver" (I use the term loosely as you can imagine it's state of compliance about 6 days of real work) looks like this:
* Basic support for JDBC interfaces (Driver, Connection, PreparedStatement, ResultSet)
* Can be discovered and loaded automatically through JDBC4 loader
* Supports ONLY JDBC4 interfaces (no attempt to compile to only JDBC2/3 is made)
* Lots of stuff missing, non-conformant or just plain broken
* Speaks "BINARY" format almost exclusively
* Extremely clean and simple interface for supporting new types; VERY DRY (only added in 1 place ever)
* Unsupported formats can fallback to "TEXT"
* Almost all of the current standard types are supported (even the hard ones like numerics, timestamps, dates)
* Can decode any recognized type to a Java object (this includes any imaginable composite or array type)
* Connection.setTypeMap and ResultSet.get(int idx, Map) are both fully supported
* Requests for composite objects that have no custom mapping are returned as HashMap
* Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])
* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)
* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
* LISTEN/NOTIFY and notifications can come through asynchronously
* Netty has a great system for managing buffers and reading/writing messages that shows increased speed
* Performance wasn't a goal of this project but it's a nice side effect
* Maven project
* Written against Java 7
* Obviously to enter any sort of real use the source version will have to be dialed down
* Shows some serious gains in performance
* Query and retrieve from the ResultSet a million rows with timestamps and it's about a 4-5x increase in speed
* Obviously some types there is no speed difference
* Again I say, performance was NOT a goal of this project it's just a nice side effect
BINARY SUPPORT
As outlined above I started the project to support decoding of Composite and Array types. To accomplish this I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection. I then create a type registry that holds all the required details of all the types. Procedures, for both TEXT and BINARY protocols, are looked up and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers". When a DataRow message is received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row. When sending parameter data the type is located and the it's encoder is called to encode the data. Reading through the driver mailing-lists, it seems using binary only has some ramifications as far as type coercion and such are concerned; currently all user initiated queries use the Extended Protocol & Statement Describe to ensure parameter types/values are correct.
Where to go from here…
The major question I would like to ask is…
Should I continue on this path of a new driver and see if people join or should I take what I have learned and try to refit it into the current code?
I am no dummy. I understand the years of experience the current driver has to ensure it works well in an extremely large number of cases. At the same time, anybody who has peeked around in there (and I have done quite a bit of it) knows its showing its age. My driver is 100% new code… not a stitch of the old was used. Give this, it seems like transplanting my new "core" into the current project would be like giving it a brain transplant just after a fresh head transplant; in other words… a rewrite.
I'd love it if some folks in the know could take a look at my code and see if it stirs up any ideas on integration or just makes you want to jump off a bridge.
If you read this far you get a cookie…
Here is the GitHub project… https://github.com/kdubb/pgjdbc-ng
-- Get the PriceGoblin Browser Addon www.pricegoblin.co.uk
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Kevin, I'm excited to try it out - I've been thinking on doing the same quite a few times while hacking yet another hotspot out of the driver, and you're right: It's showing it's age, but also it's experience. My main motivation was speed so far and I'm glad to see netty used - nice approach. I'm looking forward trying it out and lend a hand if I have the time. Thanks for sharing this, Kevin Wooten schrieb am 12.03.2013 02:19: > So… I did a crazy thing… I created an entirely new JDBC driver. [...] > If you read this far you get a cookie… > > Here is the GitHub project… https://github.com/kdubb/pgjdbc-ng > > Kevin - -- cu, Patric -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: GnuPT 2.5.2 iEYEARECAAYFAlE/ElwACgkQfGgGu8y7ypCrowCfRWWnJqldLZhktoggvjRMjNA7 EeIAnjc5SJH3FLpOLf0B+KG9pGa/e+8d =bV28 -----END PGP SIGNATURE-----
Excellent progress Kevin!
I agree that it adding new low-level functionality functionality (such as new transport layer) and other optimizations to current JDBC code can be too difficult, making rewrite the only feasible way forward.
I like your usage of Netty as the IO layer. It really allows us to experiment in the future much more with the API we provide to applications. I also like the usage of the binary protocol whenever possible.
Some ideas I have had (in addition to the ones you have already implemented):
- create an asynchronous API for JDBC extensions with futures or asynchronous result row processing
- add pipelining to connections thus allowing better use of connections (works best with simple auto-commit queries)
- integrate async api to vert.x
- integrate async api to Akka
I will definitely have a more detailed look at the code.
-Mikko
I agree that it adding new low-level functionality functionality (such as new transport layer) and other optimizations to current JDBC code can be too difficult, making rewrite the only feasible way forward.
I like your usage of Netty as the IO layer. It really allows us to experiment in the future much more with the API we provide to applications. I also like the usage of the binary protocol whenever possible.
Some ideas I have had (in addition to the ones you have already implemented):
- create an asynchronous API for JDBC extensions with futures or asynchronous result row processing
- add pipelining to connections thus allowing better use of connections (works best with simple auto-commit queries)
- integrate async api to vert.x
- integrate async api to Akka
I will definitely have a more detailed look at the code.
-Mikko
On Tue, Mar 12, 2013 at 7:32 AM, Patric Bechtel <bechtel@ipcon.de> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Kevin,
I'm excited to try it out - I've been thinking on doing the same quite a few times while hacking
yet another hotspot out of the driver, and you're right: It's showing it's age, but also it's
experience.
Can you be more specific about the issues with "experience"
Kevin Wooten wrote on 12.03.2013 02:19: > So… I did a crazy thing… I created an entirely new JDBC driver. > > (For those who care there's an actual question at the bottom… this > isn't just bragging) > > I'd love it if some folks in the know could take a look at my code > and see if it stirs up any ideas on integration or just makes you > want to jump off a bridge. > It does sound very nice, thanks very much for the effort! I didn't look at the code, but one thing I noticed when trying the driver is that it requires a bunch of other librarieswhich is highly unusual in the JDBC "world". Usually the JDBC driver jar is self contained. When I tried it with my SQL tool (SQL Workbench/J) I got the following exception when connecting: java.lang.NullPointerException at com.impossibl.postgres.protocol.v30.ProtocolImpl.sendStartup(ProtocolImpl.java:168) at com.impossibl.postgres.protocol.v30.StartupCommandImpl.execute(StartupCommandImpl.java:107) at com.impossibl.postgres.protocol.v30.ProtocolImpl.execute(ProtocolImpl.java:139) at com.impossibl.postgres.system.BasicContext.start(BasicContext.java:151) at com.impossibl.postgres.system.BasicContext.init(BasicContext.java:110) at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:56) at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:22) Regards Thomas
Ya, It occurred to me as well if you are using netty it requires another jar.
A bit unorthodox...
On Tue, Mar 12, 2013 at 1:39 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:
Kevin Wooten wrote on 12.03.2013 02:19:So… I did a crazy thing… I created an entirely new JDBC driver.
(For those who care there's an actual question at the bottom… this
isn't just bragging)I'd love it if some folks in the know could take a look at my code
and see if it stirs up any ideas on integration or just makes you
want to jump off a bridge.
It does sound very nice, thanks very much for the effort!
I didn't look at the code, but one thing I noticed when trying the driver is that it requires a bunch of other libraries which is highly unusual in the JDBC "world". Usually the JDBC driver jar is self contained.
When I tried it with my SQL tool (SQL Workbench/J) I got the following exception when connecting:
java.lang.NullPointerException
at com.impossibl.postgres.protocol.v30.ProtocolImpl.sendStartup(ProtocolImpl.java:168)
at com.impossibl.postgres.protocol.v30.StartupCommandImpl.execute(StartupCommandImpl.java:107)
at com.impossibl.postgres.protocol.v30.ProtocolImpl.execute(ProtocolImpl.java:139)
at com.impossibl.postgres.system.BasicContext.start(BasicContext.java:151)
at com.impossibl.postgres.system.BasicContext.init(BasicContext.java:110)
at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:56)
at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:22)
Regards
Thomas
--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc
On Mar 12, 2013, at 11:04 AM, Dave Cramer <pg@fastcrypt.com> wrote: > Ya, It occurred to me as well if you are using netty it requires another jar. > > A bit unorthodox... > > Dave Cramer > > dave.cramer(at)credativ(dot)ca > http://www.credativ.ca > To my understanding, with a Maven build process it is relatively easy to produce both a "unmodified" jar which is appropriate for use in Maven projects and a "shaded" jar which could be appropriate for use in tools / non-Maven-aware projects: http://maven.apache.org/plugins/maven-shade-plugin/ This way you can have the best of both worlds. > > On Tue, Mar 12, 2013 at 1:39 PM, Thomas Kellerer <spam_eater@gmx.net> wrote: > Kevin Wooten wrote on 12.03.2013 02:19: > So… I did a crazy thing… I created an entirely new JDBC driver. > > (For those who care there's an actual question at the bottom… this > isn't just bragging) > > I'd love it if some folks in the know could take a look at my code > and see if it stirs up any ideas on integration or just makes you > want to jump off a bridge. > > > It does sound very nice, thanks very much for the effort! > > I didn't look at the code, but one thing I noticed when trying the driver is that it requires a bunch of other librarieswhich is highly unusual in the JDBC "world". Usually the JDBC driver jar is self contained. > > When I tried it with my SQL tool (SQL Workbench/J) I got the following exception when connecting: > > > java.lang.NullPointerException > at com.impossibl.postgres.protocol.v30.ProtocolImpl.sendStartup(ProtocolImpl.java:168) > at com.impossibl.postgres.protocol.v30.StartupCommandImpl.execute(StartupCommandImpl.java:107) > at com.impossibl.postgres.protocol.v30.ProtocolImpl.execute(ProtocolImpl.java:139) > at com.impossibl.postgres.system.BasicContext.start(BasicContext.java:151) > at com.impossibl.postgres.system.BasicContext.init(BasicContext.java:110) > at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:56) > at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:22) > > Regards > Thomas > > > > > -- > Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-jdbc >
On 03/12/2013 08:04 PM, Dave Cramer wrote: > Ya, It occurred to me as well if you are using netty it requires another jar. > > A bit unorthodox... I think using 3rd party dependencies is the proper way to go forward instead of reimplementing all required functionalitiesin every JDBC driver. That is not to say that the required dependencies should be kept to minimum. Normal java applications should use pgjdbc-ng through maven dependencies (or similar). For cases where that is not possible, such as SQL editors we should generate also provide a fat jdbc jar that has all therequired dependencies embedded. -Mikko > Dave Cramer > > dave.cramer(at)credativ(dot)ca > http://www.credativ.ca > > > On Tue, Mar 12, 2013 at 1:39 PM, Thomas Kellerer <spam_eater@gmx.net <mailto:spam_eater@gmx.net>> wrote: > > Kevin Wooten wrote on 12.03.2013 02:19: > > So… I did a crazy thing… I created an entirely new JDBC driver. > > (For those who care there's an actual question at the bottom… this > isn't just bragging) > > I'd love it if some folks in the know could take a look at my code > and see if it stirs up any ideas on integration or just makes you > want to jump off a bridge. > > > It does sound very nice, thanks very much for the effort! > > I didn't look at the code, but one thing I noticed when trying the driver is that it requires a bunch of other librarieswhich is highly unusual in the JDBC > "world". Usually the JDBC driver jar is self contained. > > When I tried it with my SQL tool (SQL Workbench/J) I got the following exception when connecting: > > > java.lang.NullPointerException > at com.impossibl.postgres.__protocol.v30.ProtocolImpl.__sendStartup(ProtocolImpl.java:__168) > at com.impossibl.postgres.__protocol.v30.__StartupCommandImpl.execute(__StartupCommandImpl.java:107) > at com.impossibl.postgres.__protocol.v30.ProtocolImpl.__execute(ProtocolImpl.java:139) > at com.impossibl.postgres.system.__BasicContext.start(__BasicContext.java:151) > at com.impossibl.postgres.system.__BasicContext.init(__BasicContext.java:110) > at com.impossibl.postgres.jdbc.__PSQLDriver.connect(PSQLDriver.__java:56) > at com.impossibl.postgres.jdbc.__PSQLDriver.connect(PSQLDriver.__java:22) > > Regards > Thomas > > > > > -- > Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org <mailto:pgsql-jdbc@postgresql.org>) > To make changes to your subscription: > http://www.postgresql.org/__mailpref/pgsql-jdbc <http://www.postgresql.org/mailpref/pgsql-jdbc> > >
If you remember what I said, I referred to its age as both a positive and a negative. I understand that all of this special case code is what makes things work in the real world. That being said, here are some things I can remember off the top of my head...
Take the simple case of "getString" in the ResultSet. It's full of special case code, complete with a nice comment about a "hack to be compatible with text protocol" that has something to do with Dates; It even specifically references the "hstore" type by name. In addition it seems there can be up to 5 hashmap lookups, to get the type when going through getSting depending on the actual type of the column. To truly understand what this function does is a fun task as it's used in quite a few places.
The getSQLType function has comments about not being able to truly know which types are arrays and which are not. In comparison the new code doesn't care because the object is converted to an array by a decoder. I won't forget to mention that the current code is EXTREMELY slow when it comes to using getDatabaseMetaData and getMetaData calls; I mention this because I have a pull request from months ago that fixes the performance issue that hasn't been integrated yet ;)
On Mar 12, 2013, at 2:51 AM, Dave Cramer <pg@fastcrypt.com> wrote:
Kevin,I think this is a very interesting idea. I've often thought that writing a stripped down version of the driver that did the basics *very* well would be an interesting project.I'm wondering how to integrate this into the official project to live beside the current driver, and possibly supersede it ?You mention "it is showing it's age" can you be more specific ?Regards,On Mon, Mar 11, 2013 at 9:19 PM, Kevin Wooten <kdubb@me.com> wrote:So… I did a crazy thing… I created an entirely new JDBC driver.
(For those who care there's an actual question at the bottom… this isn't just bragging)
About 10 days ago I set out to fix the current driver's support for composite and array objects. Currently they are only ever retrieved as string encoding due to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare. Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol (save for Copy In/Out).
I began by hacking the driver to force it into binary mode and pulling the raw data with getBytes. Then I started building a framework for recognizing, encoding and decoding all the possible types PostgreSQL knows about. Once that was working well, my mission was to rework this into the current driver. This proved almost impossible due to 1) my limited familiarity with the code and 2) the assumptions it makes about the formats of things like parameters. In the end it seemed too time consuming for ME to do this. So, partly for fun, I decided to just implement the FE/BE protocol and see where it got me. Next thing I knew I was running queries and retrieving data. Basically it's just a side project, of a retro-fit, that went wrong and has spiraled out of control ;)
Currently my "driver" (I use the term loosely as you can imagine it's state of compliance about 6 days of real work) looks like this:
* Basic support for JDBC interfaces (Driver, Connection, PreparedStatement, ResultSet)
* Can be discovered and loaded automatically through JDBC4 loader
* Supports ONLY JDBC4 interfaces (no attempt to compile to only JDBC2/3 is made)
* Lots of stuff missing, non-conformant or just plain broken
* Speaks "BINARY" format almost exclusively
* Extremely clean and simple interface for supporting new types; VERY DRY (only added in 1 place ever)
* Unsupported formats can fallback to "TEXT"
* Almost all of the current standard types are supported (even the hard ones like numerics, timestamps, dates)
* Can decode any recognized type to a Java object (this includes any imaginable composite or array type)
* Connection.setTypeMap and ResultSet.get(int idx, Map) are both fully supported
* Requests for composite objects that have no custom mapping are returned as HashMap
* Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])
* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)
* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
* LISTEN/NOTIFY and notifications can come through asynchronously
* Netty has a great system for managing buffers and reading/writing messages that shows increased speed
* Performance wasn't a goal of this project but it's a nice side effect
* Maven project
* Written against Java 7
* Obviously to enter any sort of real use the source version will have to be dialed down
* Shows some serious gains in performance
* Query and retrieve from the ResultSet a million rows with timestamps and it's about a 4-5x increase in speed
* Obviously some types there is no speed difference
* Again I say, performance was NOT a goal of this project it's just a nice side effect
BINARY SUPPORT
As outlined above I started the project to support decoding of Composite and Array types. To accomplish this I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection. I then create a type registry that holds all the required details of all the types. Procedures, for both TEXT and BINARY protocols, are looked up and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers". When a DataRow message is received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row. When sending parameter data the type is located and the it's encoder is called to encode the data. Reading through the driver mailing-lists, it seems using binary only has some ramifications as far as type coercion and such are concerned; currently all user initiated queries use the Extended Protocol & Statement Describe to ensure parameter types/values are correct.
Where to go from here…
The major question I would like to ask is…
Should I continue on this path of a new driver and see if people join or should I take what I have learned and try to refit it into the current code?
I am no dummy. I understand the years of experience the current driver has to ensure it works well in an extremely large number of cases. At the same time, anybody who has peeked around in there (and I have done quite a bit of it) knows its showing its age. My driver is 100% new code… not a stitch of the old was used. Give this, it seems like transplanting my new "core" into the current project would be like giving it a brain transplant just after a fresh head transplant; in other words… a rewrite.
I'd love it if some folks in the know could take a look at my code and see if it stirs up any ideas on integration or just makes you want to jump off a bridge.
If you read this far you get a cookie…
Here is the GitHub project… https://github.com/kdubb/pgjdbc-ng
Kevin
I don't expect it to work with SQL workbench but after looking at this, it cannot find the property for "username", "password"or "database". "database" is properly gleaned from the URL. "username" and "password" can be provided as extra properties or as URL query parameters. What URL are you using to connect and what properties are being provided? As an aside, it currently only supports "plain" and "md5" authentication methods. On Mar 12, 2013, at 10:39 AM, Thomas Kellerer <spam_eater@gmx.net> wrote: > Kevin Wooten wrote on 12.03.2013 02:19: >> So… I did a crazy thing… I created an entirely new JDBC driver. >> >> (For those who care there's an actual question at the bottom… this >> isn't just bragging) >> >> I'd love it if some folks in the know could take a look at my code >> and see if it stirs up any ideas on integration or just makes you >> want to jump off a bridge. >> > > It does sound very nice, thanks very much for the effort! > > I didn't look at the code, but one thing I noticed when trying the driver is that it requires a bunch of other librarieswhich is highly unusual in the JDBC "world". Usually the JDBC driver jar is self contained. > > When I tried it with my SQL tool (SQL Workbench/J) I got the following exception when connecting: > > > java.lang.NullPointerException > at com.impossibl.postgres.protocol.v30.ProtocolImpl.sendStartup(ProtocolImpl.java:168) > at com.impossibl.postgres.protocol.v30.StartupCommandImpl.execute(StartupCommandImpl.java:107) > at com.impossibl.postgres.protocol.v30.ProtocolImpl.execute(ProtocolImpl.java:139) > at com.impossibl.postgres.system.BasicContext.start(BasicContext.java:151) > at com.impossibl.postgres.system.BasicContext.init(BasicContext.java:110) > at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:56) > at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:22) > > Regards > Thomas > > > > -- > Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-jdbc
Also, I fully intended that the final packaging is a single self contained jar. I have not been concerned with it too much at this point but Maven makes it pretty easy. Netty is easily embedded into thejar and, while also easy to embed, the Apache BeansUtils may not be needed in the end. On Mar 12, 2013, at 10:39 AM, Thomas Kellerer <spam_eater@gmx.net> wrote: > Kevin Wooten wrote on 12.03.2013 02:19: >> So… I did a crazy thing… I created an entirely new JDBC driver. >> >> (For those who care there's an actual question at the bottom… this >> isn't just bragging) >> >> I'd love it if some folks in the know could take a look at my code >> and see if it stirs up any ideas on integration or just makes you >> want to jump off a bridge. >> > > It does sound very nice, thanks very much for the effort! > > I didn't look at the code, but one thing I noticed when trying the driver is that it requires a bunch of other librarieswhich is highly unusual in the JDBC "world". Usually the JDBC driver jar is self contained. > > When I tried it with my SQL tool (SQL Workbench/J) I got the following exception when connecting: > > > java.lang.NullPointerException > at com.impossibl.postgres.protocol.v30.ProtocolImpl.sendStartup(ProtocolImpl.java:168) > at com.impossibl.postgres.protocol.v30.StartupCommandImpl.execute(StartupCommandImpl.java:107) > at com.impossibl.postgres.protocol.v30.ProtocolImpl.execute(ProtocolImpl.java:139) > at com.impossibl.postgres.system.BasicContext.start(BasicContext.java:151) > at com.impossibl.postgres.system.BasicContext.init(BasicContext.java:110) > at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:56) > at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:22) > > Regards > Thomas > > > > -- > Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-jdbc
Kevin Wooten wrote on 12.03.2013 02:19: >> So… I did a crazy thing… I created an entirely new JDBC driver. >> >> (For those who care there's an actual question at the bottom… this >> isn't just bragging) >> >> I'd love it if some folks in the know could take a look at my code >> and see if it stirs up any ideas on integration or just makes you >> want to jump off a bridge. >> Thomas Kellerer wrote: > > It does sound very nice, thanks very much for the effort! > > I didn't look at the code, but one thing I noticed when trying the > driver is that it requires a bunch of other libraries which is highly > unusual in the JDBC "world". Usually the JDBC driver jar is self contained. > > When I tried it with my SQL tool (SQL Workbench/J) I got the following > exception when connecting: > > > java.lang.NullPointerException > at > com.impossibl.postgres.protocol.v30.ProtocolImpl.sendStartup(ProtocolImpl.java:168) > > at > com.impossibl.postgres.protocol.v30.StartupCommandImpl.execute(StartupCommandImpl.java:107) > > at > com.impossibl.postgres.protocol.v30.ProtocolImpl.execute(ProtocolImpl.java:139) > > at com.impossibl.postgres.system.BasicContext.start(BasicContext.java:151) > at com.impossibl.postgres.system.BasicContext.init(BasicContext.java:110) > at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:56) > at com.impossibl.postgres.jdbc.PSQLDriver.connect(PSQLDriver.java:22) > > Regards > Thomas Recently I integrated an additional Database, Apache Derby, into my database access tool. Derby required three different jars to be used for JDBC support. I really don't like having to pile lots of libraries into my application in order to get it to work. If the trial that Thomas performed and my own looking at the code seems to indicate the code is incomplete and non-functional. Perhaps it could become a replacement for the current JDBC, but that may not be for some time. PostgreSQL is becoming mainstream and I would like to see additional support to keep the pgJDBC up to date, but branching seems unproductive to keeping a stable Driver in place. Integration seems more appropriate. danap.
On Tue, Mar 12, 2013 at 08:11:55PM +0200, Mikko Tiihonen wrote: > I think using 3rd party dependencies is the proper way to go forward > instead of reimplementing all required functionalities in every JDBC driver. > That is not to say that the required dependencies should be kept to minimum. Two cents: it troubles me always when I hear this reasoning because the external dependencies still increase the maintenance costs, no matter how well managed. For instance, for JDBC/DB stuff, many people care (or are mandated to care) about security and the associated maintenance. - Jukka.
>> I think using 3rd party dependencies is the proper way to go forward >> instead of reimplementing all required functionalities in every JDBC driver. >> That is not to say that the required dependencies should be kept to minimum. > > Two cents: it troubles me always when I hear this reasoning because the > external dependencies still increase the maintenance costs, no matter how > well managed. For instance, for JDBC/DB stuff, many people care (or are > mandated to care) about security and the associated maintenance. I could not agree more.
Just to put this to bed, with regards to the project under discussion, the final delivery will be a single jar file withno external dependencies. Just haven't gotten that far yet. On Mar 12, 2013, at 11:52 AM, Bryan Varner <bvarner@polarislabs.com> wrote: >>> I think using 3rd party dependencies is the proper way to go forward >>> instead of reimplementing all required functionalities in every JDBC driver. >>> That is not to say that the required dependencies should be kept to minimum. >> >> Two cents: it troubles me always when I hear this reasoning because the >> external dependencies still increase the maintenance costs, no matter how >> well managed. For instance, for JDBC/DB stuff, many people care (or are >> mandated to care) about security and the associated maintenance. > > I could not agree more. > > > -- > Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-jdbc
Yes, but it relies on extra jars who's pedigree we don't control. Which is part of the security issue.
As I said earlier I like this idea as a high performant minimal functioning driver. By minimal functioning I'd probably not implement all of the metadata API(s)
On Tue, Mar 12, 2013 at 2:55 PM, Kevin Wooten <kdubb@me.com> wrote:
Just to put this to bed, with regards to the project under discussion, the final delivery will be a single jar file with no external dependencies. Just haven't gotten that far yet.
On Mar 12, 2013, at 11:52 AM, Bryan Varner <bvarner@polarislabs.com> wrote:
>>> I think using 3rd party dependencies is the proper way to go forward
>>> instead of reimplementing all required functionalities in every JDBC driver.
>>> That is not to say that the required dependencies should be kept to minimum.
>>
>> Two cents: it troubles me always when I hear this reasoning because the
>> external dependencies still increase the maintenance costs, no matter how
>> well managed. For instance, for JDBC/DB stuff, many people care (or are
>> mandated to care) about security and the associated maintenance.
>
> I could not agree more.
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc
--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc
Kevin Wooten wrote on 12.03.2013 19:29: > I don't expect it to work with SQL workbench but after looking at > this, it cannot find the property for "username", "password" or > "database". > > "database" is properly gleaned from the URL. "username" and > "password" can be provided as extra properties or as URL query > parameters. > > What URL are you using to connect and what properties are being > provided? > Exactly the same credentials as with the "real" Postgres driver. The URL is: jdbc:postgresql://localhost/wbtest And I am using Driver.connect(String, Properties) The Properties instance contains essentially the keys "user" and "password" (as per JDBC specs). Regards Thomas
Hello,
From: Kevin Wooten <kdubb@me.com>Subject: A new JDBC driver...Date: 12. mars 2013 02:19:11 CET
About 10 days ago I set out to fix the current driver's support for composite and array objects. Currently they are only ever retrieved as string encoding due to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare. Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol (save for Copy In/Out).
Hm, nightmare might be a stretch, though I see your point. I recently wrote a parser for text representations of arrays and composites of the basic types for my own personal little pet project, which is a kind of a MyBatis-like thing but highly Postgres specific. It wasn't much work, but if you're going to support "everything" and doing it in the JDBC layer I guess going the binary route would be the better bet.
The part of my implementation that just deals with parsing the text format isn't much to gawk at, but just for the record I've got it here:
And yes, I'm sure it chokes on lots of things.
* Can decode any recognized type to a Java object (this includes any imaginable composite or array type)
* Connection.setTypeMap and ResultSet.get(int idx, Map) are both fully supported
* Requests for composite objects that have no custom mapping are returned as HashMap
* Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])
* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)
Right, that's the sort of thing I've been working at too, but since there wasn't any such support in the PgJDBC driver I went with getting the text representation with getString and parsing it.
* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
* LISTEN/NOTIFY and notifications can come through asynchronously
* Netty has a great system for managing buffers and reading/writing messages that shows increased speed
* Performance wasn't a goal of this project but it's a nice side effect
That sounds great, when I poked around the driver trying to figure out how to retrieve arrays of complex types in a non-ineffective manner I did wonder about performance if they got very large.
BINARY SUPPORT
As outlined above I started the project to support decoding of Composite and Array types. To accomplish this I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection. I then create a type registry that holds all the required details of all the types. Procedures, for both TEXT and BINARY protocols, are looked up and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers". When a DataRow message is received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row. When sending parameter data the type is located and the it's encoder is called to encode the data.
Right, you do all that at work upon initialization at runtime, which would be the way to go to have it all done generically in the driver and accessible via the JDBC methods. Before settling down and writing a parser for the text format I quickly looked inside PgJDBC to see what it would take to have it do something like this, but I realized it would be a lot of work long before I saw a solution like the way you've implemented it. In the end for my own project I went with doing all that part of the work outside of the JDBC driver by generating a static factory method for each type that the application in question actually uses, which fits the "zero reflection and full compile-time metaprogramming" approach my project has.
Should I continue on this path of a new driver and see if people join or should I take what I have learned and try to refit it into the current code?
While I can't comment on this directly I would like to say that I for one would appreciate either of those things, just as long as this doesn't fizzle out I'd be quite content. At the moment it's a non-starter for me to try out on my pet project because it heavily relies on the JDBC metadata functions (which I'm not expecting will come along any time soon either, don't worry), but it's very good to hear that someone is putting effort into areas of Postgres and JDBC that would improve these areas I've been looking at even when that effort may take a while to bear fruits. I'm not sure how to help either, seeing as I don't have much to test it against, and seem to only have vague ideas in areas you seem to have solutions.
Thanks for the effort either way,
Thor Michael
Kevin,
While I haven't looked at your driver in detail; it occurs to me that since the FE/BE protocol is broken out in the current driver, would it be possible to use your protocol implementation in the current driver ?
On Tue, Mar 12, 2013 at 6:41 PM, Thor Michael Støre <thormichael@gmail.com> wrote:
Hello,From: Kevin Wooten <kdubb@me.com>Subject: A new JDBC driver...Date: 12. mars 2013 02:19:11 CETAbout 10 days ago I set out to fix the current driver's support for composite and array objects. Currently they are only ever retrieved as string encoding due to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare. Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol (save for Copy In/Out).Hm, nightmare might be a stretch, though I see your point. I recently wrote a parser for text representations of arrays and composites of the basic types for my own personal little pet project, which is a kind of a MyBatis-like thing but highly Postgres specific. It wasn't much work, but if you're going to support "everything" and doing it in the JDBC layer I guess going the binary route would be the better bet.The part of my implementation that just deals with parsing the text format isn't much to gawk at, but just for the record I've got it here:And yes, I'm sure it chokes on lots of things.* Can decode any recognized type to a Java object (this includes any imaginable composite or array type)
* Connection.setTypeMap and ResultSet.get(int idx, Map) are both fully supported
* Requests for composite objects that have no custom mapping are returned as HashMap
* Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])
* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)Right, that's the sort of thing I've been working at too, but since there wasn't any such support in the PgJDBC driver I went with getting the text representation with getString and parsing it.* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
* LISTEN/NOTIFY and notifications can come through asynchronously
* Netty has a great system for managing buffers and reading/writing messages that shows increased speed
* Performance wasn't a goal of this project but it's a nice side effectThat sounds great, when I poked around the driver trying to figure out how to retrieve arrays of complex types in a non-ineffective manner I did wonder about performance if they got very large.BINARY SUPPORT
As outlined above I started the project to support decoding of Composite and Array types. To accomplish this I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection. I then create a type registry that holds all the required details of all the types. Procedures, for both TEXT and BINARY protocols, are looked up and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers". When a DataRow message is received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row. When sending parameter data the type is located and the it's encoder is called to encode the data.Right, you do all that at work upon initialization at runtime, which would be the way to go to have it all done generically in the driver and accessible via the JDBC methods. Before settling down and writing a parser for the text format I quickly looked inside PgJDBC to see what it would take to have it do something like this, but I realized it would be a lot of work long before I saw a solution like the way you've implemented it. In the end for my own project I went with doing all that part of the work outside of the JDBC driver by generating a static factory method for each type that the application in question actually uses, which fits the "zero reflection and full compile-time metaprogramming" approach my project has.Should I continue on this path of a new driver and see if people join or should I take what I have learned and try to refit it into the current code?While I can't comment on this directly I would like to say that I for one would appreciate either of those things, just as long as this doesn't fizzle out I'd be quite content. At the moment it's a non-starter for me to try out on my pet project because it heavily relies on the JDBC metadata functions (which I'm not expecting will come along any time soon either, don't worry), but it's very good to hear that someone is putting effort into areas of Postgres and JDBC that would improve these areas I've been looking at even when that effort may take a while to bear fruits. I'm not sure how to help either, seeing as I don't have much to test it against, and seem to only have vague ideas in areas you seem to have solutions.Thanks for the effort either way,Thor Michael
It could probably be done if it weren't for my implementation delivering Java objects. The current driver expects nothing but bytes arrays for each column of every row of the result set. So for a Postgres bool it has an byte[1] whereas my implementation stores a boolean.
The "DataRow" message handler could, fairly easily, be made to stop parsing it to the extent that it does and just return a byte[][] as the current driver does. To support my goals though, all of the ad-hoc parsing that's done in AbstractJdbc2ResultSet, and friends, would still need to be replaced with the binary parsing I am doing with my type framework.
On Mar 12, 2013, at 3:49 PM, Dave Cramer <pg@fastcrypt.com> wrote:
Kevin,While I haven't looked at your driver in detail; it occurs to me that since the FE/BE protocol is broken out in the current driver, would it be possible to use your protocol implementation in the current driver ?On Tue, Mar 12, 2013 at 6:41 PM, Thor Michael Støre <thormichael@gmail.com> wrote:Hello,From: Kevin Wooten <kdubb@me.com>Subject: A new JDBC driver...Date: 12. mars 2013 02:19:11 CETAbout 10 days ago I set out to fix the current driver's support for composite and array objects. Currently they are only ever retrieved as string encoding due to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare. Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol (save for Copy In/Out).Hm, nightmare might be a stretch, though I see your point. I recently wrote a parser for text representations of arrays and composites of the basic types for my own personal little pet project, which is a kind of a MyBatis-like thing but highly Postgres specific. It wasn't much work, but if you're going to support "everything" and doing it in the JDBC layer I guess going the binary route would be the better bet.The part of my implementation that just deals with parsing the text format isn't much to gawk at, but just for the record I've got it here:And yes, I'm sure it chokes on lots of things.* Can decode any recognized type to a Java object (this includes any imaginable composite or array type)
* Connection.setTypeMap and ResultSet.get(int idx, Map) are both fully supported
* Requests for composite objects that have no custom mapping are returned as HashMap
* Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])
* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)Right, that's the sort of thing I've been working at too, but since there wasn't any such support in the PgJDBC driver I went with getting the text representation with getString and parsing it.* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
* LISTEN/NOTIFY and notifications can come through asynchronously
* Netty has a great system for managing buffers and reading/writing messages that shows increased speed
* Performance wasn't a goal of this project but it's a nice side effectThat sounds great, when I poked around the driver trying to figure out how to retrieve arrays of complex types in a non-ineffective manner I did wonder about performance if they got very large.BINARY SUPPORT
As outlined above I started the project to support decoding of Composite and Array types. To accomplish this I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection. I then create a type registry that holds all the required details of all the types. Procedures, for both TEXT and BINARY protocols, are looked up and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers". When a DataRow message is received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row. When sending parameter data the type is located and the it's encoder is called to encode the data.Right, you do all that at work upon initialization at runtime, which would be the way to go to have it all done generically in the driver and accessible via the JDBC methods. Before settling down and writing a parser for the text format I quickly looked inside PgJDBC to see what it would take to have it do something like this, but I realized it would be a lot of work long before I saw a solution like the way you've implemented it. In the end for my own project I went with doing all that part of the work outside of the JDBC driver by generating a static factory method for each type that the application in question actually uses, which fits the "zero reflection and full compile-time metaprogramming" approach my project has.Should I continue on this path of a new driver and see if people join or should I take what I have learned and try to refit it into the current code?While I can't comment on this directly I would like to say that I for one would appreciate either of those things, just as long as this doesn't fizzle out I'd be quite content. At the moment it's a non-starter for me to try out on my pet project because it heavily relies on the JDBC metadata functions (which I'm not expecting will come along any time soon either, don't worry), but it's very good to hear that someone is putting effort into areas of Postgres and JDBC that would improve these areas I've been looking at even when that effort may take a while to bear fruits. I'm not sure how to help either, seeing as I don't have much to test it against, and seem to only have vague ideas in areas you seem to have solutions.Thanks for the effort either way,Thor Michael
<div>På tirsdag 12. mars 2013 kl. 02:19:11, skrev Kevin Wooten <<a href="mailto:kdubb@me.com" target="_blank">kdubb@me.com</a>>:</div><blockquotestyle="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt0.8ex; padding-left: 1ex;">So… I did a crazy thing… I created an entirely new JDBC driver.<br /><br /> (For those whocare there's an actual question at the bottom… this isn't just bragging)<br /><br /> About 10 days ago I set out to fixthe current driver's support for composite and array objects. Currently they are only ever retrieved as string encodingdue to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare. Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol(save for Copy In/Out).<br /><br /> I began by hacking the driver to force it into binary mode and pulling the rawdata with getBytes. Then I started building a framework for recognizing, encoding and decoding all the possible typesPostgreSQL knows about. Once that was working well, my mission was to rework this into the current driver. This provedalmost impossible due to 1) my limited familiarity with the code and 2) the assumptions it makes about the formatsof things like parameters. In the end it seemed too time consuming for ME to do this. So, partly for fun, I decidedto just implement the FE/BE protocol and see where it got me. Next thing I knew I was running queries and retrievingdata. Basically it's just a side project, of a retro-fit, that went wrong and has spiraled out of control ;)<br/><br /> Currently my "driver" (I use the term loosely as you can imagine it's state of compliance about 6 days of realwork) looks like this:<br /><br /> * Basic support for JDBC interfaces (Driver, Connection, PreparedStatement, ResultSet)<br/> * Can be discovered and loaded automatically through JDBC4 loader<br /> * Supports ONLY JDBC4 interfaces(no attempt to compile to only JDBC2/3 is made)<br /> * Lots of stuff missing, non-conformant or just plain broken<br/> * Speaks "BINARY" format almost exclusively<br /> * Extremely clean and simple interface for supporting new types;VERY DRY (only added in 1 place ever)<br /> * Unsupported formats can fallback to "TEXT"<br /> * Almost all of thecurrent standard types are supported (even the hard ones like numerics, timestamps, dates)<br /> * Can decode any recognizedtype to a Java object (this includes any imaginable composite or array type)<br /> * Connection.setTypeMap andResultSet.get(int idx, Map) are both fully supported<br /> * Requests for composite objects that have no custom mappingare returned as HashMap<br /> * Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])<br />* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)<br /> * Asynchronous I/Oengine provided by Netty<br /> * All connections share a single group of worker threads<br /> * LISTEN/NOTIFY and notificationscan come through asynchronously<br /> * Netty has a great system for managing buffers and reading/writing messagesthat shows increased speed<br /> * Performance wasn't a goal of this project but it's a nice side effect<br /> *Maven project<br /> * Written against Java 7<br /> * Obviously to enter any sort of real use the source version will haveto be dialed down<br /> * Shows some serious gains in performance<br /> * Query and retrieve from the ResultSet a millionrows with timestamps and it's about a 4-5x increase in speed<br /> * Obviously some types there is no speed difference<br/> * Again I say, performance was NOT a goal of this project it's just a nice side effect<br /><br /> BINARYSUPPORT<br /> As outlined above I started the project to support decoding of Composite and Array types. To accomplishthis I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection. I then create a type registrythat holds all the required details of all the types. Procedures, for both TEXT and BINARY protocols, are lookedup and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers". When a DataRow messageis received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row. When sending parameter data the type is located and the it's encoder is called to encode the data. Reading through thedriver mailing-lists, it seems using binary only has some ramifications as far as type coercion and such are concerned;currently all user initiated queries use the Extended Protocol & Statement Describe to ensure parameter types/valuesare correct.<br /><br /> Where to go from here…<br /><br /> The major question I would like to ask is… <br /><br/> Should I continue on this path of a new driver and see if people join or should I take what I have learned and tryto refit it into the current code?<br /><br /> I am no dummy. I understand the years of experience the current driverhas to ensure it works well in an extremely large number of cases. At the same time, anybody who has peeked aroundin there (and I have done quite a bit of it) knows its showing its age. My driver is 100% new code… not a stitchof the old was used. Give this, it seems like transplanting my new "core" into the current project would be like givingit a brain transplant just after a fresh head transplant; in other words… a rewrite.<br /><br /> I'd love it if somefolks in the know could take a look at my code and see if it stirs up any ideas on integration or just makes you wantto jump off a bridge.<br /><br /> If you read this far you get a cookie…<br /><br /> Here is the GitHub project… <ahref="https://github.com/kdubb/pgjdbc-ng" target="_blank">https://github.com/kdubb/pgjdbc-ng</a><br /><br /> Kevin</blockquote><div> </div><div>Coolio!!Can't wait to see what happens next. It's often fresh, ambitious, work like thiswhich brings the world forward.</div><div> </div><div class="origo-email-signature">--<br /> Andreas Joseph Krogh <andreak@officenet.no> mob: +47 909 56 963<br /> Senior Software Developer / CTO - OfficeNet AS - http://www.officenet.no<br/> Public key: http://home.officenet.no/~andreak/public_key.asc</div><div> </div>
On 03/12/2013 09:19 AM, Kevin Wooten wrote:
That's going to cause serious issues on Java EE, especially when unloading applications. Since you're on JDBC4 you can have init and cleanup functions to manage the thread pool, but these are vital. I asked about this re PgJDBC quite some time ago and got some good information that's summarized here:
* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
http://stackoverflow.com/q/8514725/398670
Failure to properly shut the thread pool down when a servlet or application is unloaded will cause classloader leaks, tending to lead to PermGenSpace exhaustion errors and other problems. The driver will probably need application-server-specific integration hooks too.
As for the "multiple JARs" converns, it's trivial to bundle dependencies inside the JDBC driver jar. However, this can cause issues if there's an incompatible version of the same library elsewhere on the classpath. It's OK if you're on a modern application server like JBoss AS 7 that isolates classloader chains, but it can be a real problem on older platforms and standalone applications. For this reason both a rollup version of the jar and a version without bundled libraries would probably be needed, but this is trivial to produce from Maven.
Overall I think this is an intriguing idea, whether it proves to be an ideas testbed or something that becomes appealing to adopt more seriously. Congratulations on tackling it.
-- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
I agree on the importance of avoiding conflicts with the class path for the application server. I think jarjar can easily solve such a problem by embedding the required third party libraries in the final jar file in such a way that conflicts with other versions of the same library does not arise https://code.google.com/p/jarjar/ . The spring framework uses jar jar to package internal versions of ASM and CGLIB, the org.springframework. package hierarchy
On 2013-03-14, at 12:56 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 03/12/2013 09:19 AM, Kevin Wooten wrote:That's going to cause serious issues on Java EE, especially when unloading applications. Since you're on JDBC4 you can have init and cleanup functions to manage the thread pool, but these are vital. I asked about this re PgJDBC quite some time ago and got some good information that's summarized here:
* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
http://stackoverflow.com/q/8514725/398670
Failure to properly shut the thread pool down when a servlet or application is unloaded will cause classloader leaks, tending to lead to PermGenSpace exhaustion errors and other problems. The driver will probably need application-server-specific integration hooks too.
As for the "multiple JARs" converns, it's trivial to bundle dependencies inside the JDBC driver jar. However, this can cause issues if there's an incompatible version of the same library elsewhere on the classpath. It's OK if you're on a modern application server like JBoss AS 7 that isolates classloader chains, but it can be a real problem on older platforms and standalone applications. For this reason both a rollup version of the jar and a version without bundled libraries would probably be needed, but this is trivial to produce from Maven.
Overall I think this is an intriguing idea, whether it proves to be an ideas testbed or something that becomes appealing to adopt more seriously. Congratulations on tackling it.-- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Mar 14, 2013 at 5:56 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 03/12/2013 09:19 AM, Kevin Wooten wrote: > * Asynchronous I/O engine provided by Netty > * All connections share a single group of worker threads > > That's going to cause serious issues on Java EE, especially when unloading > applications. Since you're on JDBC4 you can have init and cleanup functions > to manage the thread pool, but these are vital. I asked about this re PgJDBC > quite some time ago and got some good information that's summarized here: > > http://stackoverflow.com/q/8514725/398670 > > Failure to properly shut the thread pool down when a servlet or application > is unloaded will cause classloader leaks, tending to lead to PermGenSpace > exhaustion errors and other problems. The driver will probably need > application-server-specific integration hooks too. > > As for the "multiple JARs" converns, it's trivial to bundle dependencies > inside the JDBC driver jar. However, this can cause issues if there's an > incompatible version of the same library elsewhere on the classpath. It's OK > if you're on a modern application server like JBoss AS 7 that isolates > classloader chains, but it can be a real problem on older platforms and > standalone applications. For this reason both a rollup version of the jar > and a version without bundled libraries would probably be needed, but this > is trivial to produce from Maven. maven-shade-plugin was already mentioned to solve the class conflict issues (similar to jarjar). Florent > > Overall I think this is an intriguing idea, whether it proves to be an ideas > testbed or something that becomes appealing to adopt more seriously. > Congratulations on tackling it. > > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services -- Florent Guillaume, Director of R&D, Nuxeo Open Source, Java EE based, Enterprise Content Management (ECM) http://www.nuxeo.com http://www.nuxeo.org +33 1 40 33 79 87
Craig, thanks a lot for the information. I read you SO question and related info, and then did a bit of researching on my own. I have come up with a few things…
Starting your own threads is not suggested because of the problems with not have a definitive startup/shutdown (until more recent JavaEE versions) and any thread you start cannot use any of the services of the container. Everything else I have read just says "IF" you don't shut the threads down properly you could wreak havoc, eat up resources, etc, etc.
The first issue isn't an issue at all. The threads handle the I/O only and all data is delivered back to an application level thread for processing. Basically the fact that threads are used is completely transparent to the application code; it treats the calls synchronously. Theoretically, because you can in C, you should be able to do Async I/O without threads but there seems to be some belief that, in Java, threads are the way to go; all the popular libraries I've seen are using thread pools.
Secondly, I have actually paid a bit of attention to the issue of threads shutting down because any abandoned connection causes the threads to remain active and the program to, at the very least, not be able to shutdown. I used a reference counting system to share the thread pool between connections. This guarantees that if the connections are properly closed, the connections will be killed. I did a bit of experimentation last night with using weak references everywhere and trying to handle the case where a person forgets to close a connection. After a bit of messing around I finally settled on the much maligned "finalizer" to kill the connection if it's abandoned. Connections are pretty heavyweight things and I don't believe finalizers are going to cause any issue with performance. Most applications will not be creating & destroying thousands of connections such that it creates a performance problem for the GC because of finalizers. Any good app, and especially JavaEE containers, will be pooling them and closing them properly anyway, which will in turn shut the threads down properly.
I did manage to hack into my driver an implementation of setQueryTimeout. The disconnected nature made this quite simple as my current indefinite wait for query complete just became a wait with timeout. The real issue I struggled with, so I put it on hold until later, was what to do once the timeout happens. I believe you would need to issue a "Cancel" to the server and then send a "Sync" message. PostgreSQL's cancel system is pretty interesting. You have to open a new socket and send a cancel packet. This works great unless the reason for the timeout was a network issue... then you need to handle the connection timeout, send timeout, etc. of the cancel procedure and finally you still have to wait for the servers Error response from your cancel! So it gets pretty darn interesting but I should be able to make it happen properly in the new driver.
Finally, with regard to your SO question since there seems to be no answer, you could try, as I touched on earlier, and implement the query timeout by using non-blocking sockets, selectors and the like. I think you'll quickly grow to appreciate why others are using threads; Java has made something that was easy in C, very hard. Also, in my journeys last night I discovered the statement "statement_timeout" connection parameter. If you didn't know about it already, the server will cancel any statement that takes longer than this value. It may be an easy solution to your problem.
Kevin
On Mar 13, 2013, at 9:56 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 03/12/2013 09:19 AM, Kevin Wooten wrote:That's going to cause serious issues on Java EE, especially when unloading applications. Since you're on JDBC4 you can have init and cleanup functions to manage the thread pool, but these are vital. I asked about this re PgJDBC quite some time ago and got some good information that's summarized here:
* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
http://stackoverflow.com/q/8514725/398670
Failure to properly shut the thread pool down when a servlet or application is unloaded will cause classloader leaks, tending to lead to PermGenSpace exhaustion errors and other problems. The driver will probably need application-server-specific integration hooks too.
As for the "multiple JARs" converns, it's trivial to bundle dependencies inside the JDBC driver jar. However, this can cause issues if there's an incompatible version of the same library elsewhere on the classpath. It's OK if you're on a modern application server like JBoss AS 7 that isolates classloader chains, but it can be a real problem on older platforms and standalone applications. For this reason both a rollup version of the jar and a version without bundled libraries would probably be needed, but this is trivial to produce from Maven.
Overall I think this is an intriguing idea, whether it proves to be an ideas testbed or something that becomes appealing to adopt more seriously. Congratulations on tackling it.-- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 03/15/2013 01:15 PM, Kevin Wooten wrote:
Craig, thanks a lot for the information. I read you SO question and related info, and then did a bit of researching on my own. I have come up with a few things…Just to clarify, it seems you thought I was asking for help with that question. I wasn't, I just wanted to direct you to the informative answer given and raise your awareness of the concerns around threading in Java EE application servers in case you weren't aware of the problems there.
If you're unaware of classloader leaks and the related problems, start here:
http://frankkieviet.blogspot.com.au/2006/10/classloader-leaks-dreaded-permgen-space.html
Yes... and you need to implement ServiceLoader hooks in order for threads created by the JDBC driver to be properly terminated before the driver is unloaded.Starting your own threads is not suggested because of the problems with not have a definitive startup/shutdown (until more recent JavaEE versions) and any thread you start cannot use any of the services of the container. Everything else I have read just says "IF" you don't shut the threads down properly you could wreak havoc, eat up resources, etc, etc.
Remember that in Java EE the JDBC driver may be unloaded without the app server being terminated. This mainly occurs in apps that bundle JDBC drivers in their .war instead of using the server connection pool, but is also seen when JDBC drivers are uninstalled or unloaded from a running server as is common in upgrades.
If the driver has its own threads and it doesn't terminate every thread in the pool the unload will fail and you'll get a classloader leak.
To the application code yes, to the app server, not necessarily, because the threads hold references to classes that hold references to the .war classloader in the case of in-app-bundled JDBC drivers. Unlike every other kind of Java object a thread is not GC'd if its refcount falls to zero as a thread is a GC root. So if a leaked connection is GC'd the associated threads may not be. You cannot rely on the finalize method to reliably or correctly clean up the threads as it may not run under far too many circumstances.The first issue isn't an issue at all. The threads handle the I/O only and all data is delivered back to an application level thread for processing. Basically the fact that threads are used is completely transparent to the application code; it treats the calls synchronously.
For that reason, it's important for the driver to keep track of its threads and ensure that it explicitly shuts down its thread pool when it is unloaded by the service loader.
That's not the case if you're creating daemon threads (which you should be) - but it can cause other problems with object leaks. Even if you hold only weak references in your thread the thread its self can still cause a classloader leak since it holds a strong reference to its class object (or in the case of an unsubclassed Thread, to the runnable and via it to the runnable's class).Secondly, I have actually paid a bit of attention to the issue of threads shutting down because any abandoned connection causes the threads to remain active and the program to, at the very least, not be able to shutdown.
Good idea. The driver *must* continue to function when connections aren't closed, as this is unfortunately extremely common. Relying on finalizers won't cut it in my opinion, they're just too unreliable to use for anything except logging warnings to say you forgot to close something.I used a reference counting system to share the thread pool between connections. This guarantees that if the connections are properly closed, the threads will be killed. I did a bit of experimentation last night with using weak references everywhere and trying to handle the case where a person forgets to close a connection.
The main issues with finalizers isn't performance, it is the number of circumstances under which they just don't run, run multiple times, or otherwise have undesirable characteristics.
Yeah, I find this to be particularly awful myself. I vaguely recall seeing some discussion suggesting that in-band cancel was possible, but I'll have to delve through my archives to find out where.PostgreSQL's cancel system is pretty interesting. You have to open a new socket and send a cancel packet.
Heh, I'm well aware of statement_timeout. In that question I was asking for guidance about sane ways to use threading within the JDBC driver as I was hoping to have time to implement threaded I/O within PgJDBC to finally solve the problem with the driver deadlocking when it's blocked on a read and the server won't reply until it gets another write. As you've alluded to, async I/O wasn't really an option especially since it plays very poorly with SSL sockets. The answer referring to the service loader was very helpful and that's what I was pointing you towards.Finally, with regard to your SO question since there seems to be no answer, you could try, as I touched on earlier, and implement the query timeout by using non-blocking sockets, selectors and the like. I think you'll quickly grow to appreciate why others are using threads; Java has made something that was easy in C, very hard. Also, in my journeys last night I discovered the statement "statement_timeout" connection parameter. If you didn't know about it already, the server will cancel any statement that takes longer than this value. It may be an easy solution to your problem.
-- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 03/15/2013 01:15 PM, Kevin Wooten wrote:
After a bit of messing around I finally settled on the much maligned "finalizer" to kill the connection if it's abandoned.Oh, and re finalizers:
https://www.securecoding.cert.org/confluence/display/java/MET12-J.+Do+not+use+finalizers
In particular, these points:
- There is no fixed time at which finalizers must be executed because this depends on the JVM. The only guarantee is that any finalizer method that executes will do so sometime after the associated object has become unreachable (detected during the first cycle of garbage collection) and sometime before the garbage collector reclaims the associated object's storage (during the garbage collector's second cycle). Execution of an object's finalizer may be delayed for an arbitrarily long time after the object becomes unreachable. Consequently, invoking time-critical functionality such as closing file handles in an object's
finalize()
method is problematic.
- The JVM may terminate without invoking the finalizer on some or all unreachable objects. Consequently, attempts to update critical persistent state from finalizer methods can fail without warning. Similarly, Java lacks any guarantee that finalizers will execute on process termination. Methods such as
System.gc()
,System.runFinalization()
,System.runFinalizersOnExit()
, andRuntime.runFinalizersOnExit()
either lack such guarantees or have been deprecated because of lack of safety and potential for deadlock.
- According to the Java Language Specification, §12.6.2, "Finalizer Invocations are Not Ordered" [JLS 2005]:
One consequence is that slow-running finalizers can delay execution of other finalizers in the queue. Further, the lack of guaranteed ordering can lead to substantial difficulty in maintaining desired program invariants.The Java programming language imposes no ordering on
finalize()
method calls. Finalizers [of different objects] may be called in any order, or even concurrently. - Use of finalizers can introduce synchronization issues even when the remainder of the program is single-threaded. The
finalize()
methods are invoked by the garbage collector from one or more threads of its choice; these threads are typically distinct from themain()
thread, although this property is not guaranteed. When a finalizer is necessary, any required cleanup data structures must be protected from concurrent access. See the JavaOne presentation by Hans J. Boehm [Boehm 2005] for additional information.
I really need to look into how other JDBC drivers handle these issues; it's not like this is a new problem.
-- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
I understood you weren't looking for an answer. I figured I'd share the knowledge I had... just in case ;)
I really appreciate the information you are providing as you seem to be well versed in the issues. I do have one concern though. I feel like we are discussing the case where connections are not closed correctly. Are you saying that even if the connections are closed properly that the threads can still leak resources? I can't see how that would happen as properly closing connections will ensure the threads are shutdown manually. Shutting down the threads manually will ensure their resources are released. If we are talking about the error condition where connections are not closed I am not so concerned.
On Mar 15, 2013, at 12:10 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 03/15/2013 01:15 PM, Kevin Wooten wrote:Craig, thanks a lot for the information. I read you SO question and related info, and then did a bit of researching on my own. I have come up with a few things…Just to clarify, it seems you thought I was asking for help with that question. I wasn't, I just wanted to direct you to the informative answer given and raise your awareness of the concerns around threading in Java EE application servers in case you weren't aware of the problems there.
If you're unaware of classloader leaks and the related problems, start here:
http://frankkieviet.blogspot.com.au/2006/10/classloader-leaks-dreaded-permgen-space.htmlYes... and you need to implement ServiceLoader hooks in order for threads created by the JDBC driver to be properly terminated before the driver is unloaded.Starting your own threads is not suggested because of the problems with not have a definitive startup/shutdown (until more recent JavaEE versions) and any thread you start cannot use any of the services of the container. Everything else I have read just says "IF" you don't shut the threads down properly you could wreak havoc, eat up resources, etc, etc.
Remember that in Java EE the JDBC driver may be unloaded without the app server being terminated. This mainly occurs in apps that bundle JDBC drivers in their .war instead of using the server connection pool, but is also seen when JDBC drivers are uninstalled or unloaded from a running server as is common in upgrades.
If the driver has its own threads and it doesn't terminate every thread in the pool the unload will fail and you'll get a classloader leak.To the application code yes, to the app server, not necessarily, because the threads hold references to classes that hold references to the .war classloader in the case of in-app-bundled JDBC drivers. Unlike every other kind of Java object a thread is not GC'd if its refcount falls to zero as a thread is a GC root. So if a leaked connection is GC'd the associated threads may not be. You cannot rely on the finalize method to reliably or correctly clean up the threads as it may not run under far too many circumstances.The first issue isn't an issue at all. The threads handle the I/O only and all data is delivered back to an application level thread for processing. Basically the fact that threads are used is completely transparent to the application code; it treats the calls synchronously.
For that reason, it's important for the driver to keep track of its threads and ensure that it explicitly shuts down its thread pool when it is unloaded by the service loader.That's not the case if you're creating daemon threads (which you should be) - but it can cause other problems with object leaks. Even if you hold only weak references in your thread the thread its self can still cause a classloader leak since it holds a strong reference to its class object (or in the case of an unsubclassed Thread, to the runnable and via it to the runnable's class).Secondly, I have actually paid a bit of attention to the issue of threads shutting down because any abandoned connection causes the threads to remain active and the program to, at the very least, not be able to shutdown.Good idea. The driver *must* continue to function when connections aren't closed, as this is unfortunately extremely common. Relying on finalizers won't cut it in my opinion, they're just too unreliable to use for anything except logging warnings to say you forgot to close something.I used a reference counting system to share the thread pool between connections. This guarantees that if the connections are properly closed, the threads will be killed. I did a bit of experimentation last night with using weak references everywhere and trying to handle the case where a person forgets to close a connection.
The main issues with finalizers isn't performance, it is the number of circumstances under which they just don't run, run multiple times, or otherwise have undesirable characteristics.Yeah, I find this to be particularly awful myself. I vaguely recall seeing some discussion suggesting that in-band cancel was possible, but I'll have to delve through my archives to find out where.PostgreSQL's cancel system is pretty interesting. You have to open a new socket and send a cancel packet.Heh, I'm well aware of statement_timeout. In that question I was asking for guidance about sane ways to use threading within the JDBC driver as I was hoping to have time to implement threaded I/O within PgJDBC to finally solve the problem with the driver deadlocking when it's blocked on a read and the server won't reply until it gets another write. As you've alluded to, async I/O wasn't really an option especially since it plays very poorly with SSL sockets. The answer referring to the service loader was very helpful and that's what I was pointing you towards.Finally, with regard to your SO question since there seems to be no answer, you could try, as I touched on earlier, and implement the query timeout by using non-blocking sockets, selectors and the like. I think you'll quickly grow to appreciate why others are using threads; Java has made something that was easy in C, very hard. Also, in my journeys last night I discovered the statement "statement_timeout" connection parameter. If you didn't know about it already, the server will cancel any statement that takes longer than this value. It may be an easy solution to your problem.-- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Yes I am familiar with these issues. Although, as I said previously, this is only for the case where connections, statements and result-sets are not being closed. Finalizers, with all their issues, are a lot easier and serve the same purpose as tracking things with some combination of weak, soft or phantom references and trying to basically do a finalize.
In it's final state I imagine the driver will be able to withstand leaked ResultSets and Statements but not Connections; you'll have to close your Connections properly. To accomplish this I am using weak refs between the statements and connections and employing a finalize to catch the case where the statements and/or result sets were leaked. If you, or anybody, has a better idea. I am all ears.
On Mar 15, 2013, at 12:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 03/15/2013 01:15 PM, Kevin Wooten wrote:After a bit of messing around I finally settled on the much maligned "finalizer" to kill the connection if it's abandoned.Oh, and re finalizers:
https://www.securecoding.cert.org/confluence/display/java/MET12-J.+Do+not+use+finalizers
In particular, these points:
- There is no fixed time at which finalizers must be executed because this depends on the JVM. The only guarantee is that any finalizer method that executes will do so sometime after the associated object has become unreachable (detected during the first cycle of garbage collection) and sometime before the garbage collector reclaims the associated object's storage (during the garbage collector's second cycle). Execution of an object's finalizer may be delayed for an arbitrarily long time after the object becomes unreachable. Consequently, invoking time-critical functionality such as closing file handles in an object's
finalize()
method is problematic.
- The JVM may terminate without invoking the finalizer on some or all unreachable objects. Consequently, attempts to update critical persistent state from finalizer methods can fail without warning. Similarly, Java lacks any guarantee that finalizers will execute on process termination. Methods such as
System.gc()
,System.runFinalization()
,System.runFinalizersOnExit()
, andRuntime.runFinalizersOnExit()
either lack such guarantees or have been deprecated because of lack of safety and potential for deadlock.In particular I would be concerned that a thread refcounting system would be subject to ordering problems or deadlocks introduced by attempts to introduce fixes for ordering problems.
- According to the Java Language Specification, §12.6.2, "Finalizer Invocations are Not Ordered" [JLS 2005]:
One consequence is that slow-running finalizers can delay execution of other finalizers in the queue. Further, the lack of guaranteed ordering can lead to substantial difficulty in maintaining desired program invariants.The Java programming language imposes no ordering on
finalize()
method calls. Finalizers [of different objects] may be called in any order, or even concurrently.- Use of finalizers can introduce synchronization issues even when the remainder of the program is single-threaded. The
finalize()
methods are invoked by the garbage collector from one or more threads of its choice; these threads are typically distinct from themain()
thread, although this property is not guaranteed. When a finalizer is necessary, any required cleanup data structures must be protected from concurrent access. See the JavaOne presentation by Hans J. Boehm [Boehm 2005] for additional information.
I really need to look into how other JDBC drivers handle these issues; it's not like this is a new problem.-- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services