Re: A new JDBC driver... - Mailing list pgsql-jdbc

From Kevin Wooten
Subject Re: A new JDBC driver...
Date
Msg-id D61562E4-6497-4262-8717-703015512E05@me.com
Whole thread Raw
In response to Re: A new JDBC driver...  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: A new JDBC driver...  (Craig Ringer <craig@2ndquadrant.com>)
Re: A new JDBC driver...  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-jdbc
Craig, thanks a lot for the information.  I read you SO question and related info, and then did a bit of researching on my own.  I have come up with a few things…

Starting your own threads is not suggested because of the problems with not have a definitive startup/shutdown (until more recent JavaEE versions) and any thread you start cannot use any of the services of the container.  Everything else I have read just says "IF" you don't shut the threads down properly you could wreak havoc,  eat up resources, etc, etc.

The first issue isn't an issue at all.  The threads handle the I/O only and all data is delivered back to an application level thread for processing. Basically the fact that threads are used is completely transparent to the application code; it treats the calls synchronously.  Theoretically, because you can in C, you should be able to do Async I/O without threads but there seems to be some belief that, in Java, threads are the way to go; all the popular libraries I've seen are using thread pools.

Secondly, I have actually paid a bit of attention to the issue of threads shutting down because any abandoned connection causes the threads to remain active and the program to, at the very least, not be able to shutdown.  I used a reference counting system to share the thread pool between connections.  This guarantees that if the connections are properly closed, the connections will be killed.  I did a bit of experimentation last night with using weak references everywhere and trying to handle the case where a person forgets to close a connection. After a bit of messing around I finally settled on the much maligned "finalizer" to kill the connection if it's abandoned.  Connections are pretty heavyweight things and I don't believe finalizers are going to cause any issue with performance. Most applications will not be creating & destroying thousands of connections such that it creates a performance problem for the GC because of finalizers.  Any good app, and especially JavaEE containers, will be pooling them and closing them properly anyway, which will in turn shut the threads down properly.

I did manage to hack into my driver an implementation of setQueryTimeout.  The disconnected nature made this quite simple as my current indefinite wait for query complete just became a wait with timeout.  The real issue I struggled with, so I put it on hold until later, was what to do once the timeout happens.  I believe you would need to issue a "Cancel" to the server and then send a "Sync" message.  PostgreSQL's cancel system is pretty interesting. You have to open a new socket and send a cancel packet.  This works great unless the reason for the timeout was a network issue...  then you need to handle the connection timeout, send timeout, etc. of the cancel procedure and finally you still have to wait for the servers Error response from your cancel!   So it gets pretty darn interesting but I should be able to make it happen properly in the new driver.

Finally, with regard to your SO question since there seems to be no answer, you could try, as I touched on earlier, and implement the query timeout by using non-blocking sockets, selectors and the like.  I think you'll quickly grow to appreciate why others are using threads; Java has made something that was easy in C, very hard.  Also, in my journeys last night I discovered the statement "statement_timeout" connection parameter.  If you didn't know about it already, the server will cancel any statement that takes longer than this value.  It may be an easy solution to your problem.

Kevin

On Mar 13, 2013, at 9:56 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 03/12/2013 09:19 AM, Kevin Wooten wrote:

* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
That's going to cause serious issues on Java EE, especially when unloading applications. Since you're on JDBC4 you can have init and cleanup functions to manage the thread pool, but these are vital. I asked about this re PgJDBC quite some time ago and got some good information that's summarized here:

http://stackoverflow.com/q/8514725/398670

Failure to properly shut the thread pool down when a servlet or application is unloaded will cause classloader leaks, tending to lead to PermGenSpace exhaustion errors and other problems. The driver will probably need application-server-specific integration hooks too.

As for the "multiple JARs" converns, it's trivial to bundle dependencies inside the JDBC driver jar. However, this can cause issues if there's an incompatible version of the same library elsewhere on the classpath. It's OK if you're on a modern application server like JBoss AS 7 that isolates classloader chains, but it can be a real problem on older platforms and standalone applications. For this reason both a rollup version of the jar and a version without bundled libraries would probably be needed, but this is trivial to produce from Maven.

Overall I think this is an intriguing idea, whether it proves to be an ideas testbed or something that becomes appealing to adopt more seriously. Congratulations on tackling it.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

pgsql-jdbc by date:

Previous
From: Kris Jurka
Date:
Subject: Re: Improper type conversion from smallint to short
Next
From: Craig Ringer
Date:
Subject: Re: A new JDBC driver...