Thread: statement caching proof of concept
This is just proof of concept. More work has to be done to make it build properly and work properly under different jdk's Couple of questions. 1) What to do if there are multiple concurrent requests per connection for the same statement? 1) we could just allow it 2) we could return a non-cacheable preparedstatement 3) throw an exception 2) Is it enough to cache prepared statements or should we cache statements too? Note: this work is based completely on apache's dbcp statement caching implementation and this will be noted in the final code.
Attachment
On Mon, 19 Jun 2006, Dave Cramer wrote: > This is just proof of concept. More work has to be done to make it build > properly and work properly under different jdk's That's neat, but isn't statement caching something that belongs to the application server, not the JDBC driver? Also, it seems to me that the changes to AbstractJdbc2Connection and AbstractJdbc2Statement are completely unrelated to statement caching. > Note: this work is based completely on apache's dbcp statement caching > implementation and this will be noted in the final code. What kind of licensing issues does that bring us? - Heikki
On 19-Jun-06, at 2:07 PM, Heikki Linnakangas wrote: > On Mon, 19 Jun 2006, Dave Cramer wrote: > >> This is just proof of concept. More work has to be done to make it >> build properly and work properly under different jdk's > > That's neat, but isn't statement caching something that belongs to > the application server, not the JDBC driver? There's an interesting question. However the way it has been done it can be very easily removed, or added. > > Also, it seems to me that the changes to AbstractJdbc2Connection > and AbstractJdbc2Statement are completely unrelated to statement > caching. > >> Note: this work is based completely on apache's dbcp statement >> caching implementation and this will be noted in the final code. > > What kind of licensing issues does that bring us? None, their license is more or less bsd. > > - Heikki >
Dave Cramer wrote: > This is just proof of concept. More work has to be done to make it > build properly and work properly under different jdk's Interesting. How is it intended to be used? Automatically for every prepared statement? How does this compare to other JDBC implementations? And, why would this be a part of the Postgres driver? It seems more like an add-on to JDBC drivers in general. I do use a pool for prepared statements myself, actually quite like the one you wrote. But it is part of my general database related code library. Don't get me wrong, if there is a clean way to boost the performance of the driver without allocating too many resources, i'd like to see it in the driver. But if it is complicated to use, or tries do to "clever" things behind my back, i'm not so sure. Maybe activate it via a parameter like preparedThreshold? > Couple of questions. > > 1) What to do if there are multiple concurrent requests per > connection for the same statement? > 1) we could just allow it > 2) we could return a non-cacheable preparedstatement > 3) throw an exception Should do whatever a normal prepared statement would do in this situation, imho. That is, if i have code that uses normal prepared statements, and is updated to use these cached statements, it should not fail in any new ways. Or either the different behaviour it must be very well documented. > 2) Is it enough to cache prepared statements or should we cache > statements too? Actually, if i don't use a prepared statement, i want just this: do it once, then forget about it. Though it could be nice to boost stupid code that doesn't know about prepared statements; which is not really a problem of the driver. > Note: this work is based completely on apache's dbcp statement > caching implementation and this will be noted in the final code. What does that mean? Till
On 19-Jun-06, at 2:57 PM, till toenges wrote: > Dave Cramer wrote: >> This is just proof of concept. More work has to be done to make it >> build properly and work properly under different jdk's > > Interesting. How is it intended to be used? Automatically for every > prepared statement? How does this compare to other JDBC > implementations? Essentially yes, however this is all good feedback. The way it's implemented we could use it or not on a per connection basis. > > And, why would this be a part of the Postgres driver? It seems more > like > an add-on to JDBC drivers in general. I do use a pool for prepared > statements myself, actually quite like the one you wrote. But it is > part > of my general database related code library. Other drivers use statement caching namely Oracle. > > Don't get me wrong, if there is a clean way to boost the > performance of > the driver without allocating too many resources, i'd like to see > it in > the driver. But if it is complicated to use, or tries do to "clever" > things behind my back, i'm not so sure. Maybe activate it via a > parameter like preparedThreshold? All good feedback > >> Couple of questions. >> >> 1) What to do if there are multiple concurrent requests per >> connection for the same statement? >> 1) we could just allow it >> 2) we could return a non-cacheable preparedstatement >> 3) throw an exception > > Should do whatever a normal prepared statement would do in this > situation, imho. That is, if i have code that uses normal prepared > statements, and is updated to use these cached statements, it > should not > fail in any new ways. Or either the different behaviour it must be > very > well documented. I'd agree the driver should not fail. > >> 2) Is it enough to cache prepared statements or should we cache >> statements too? > > Actually, if i don't use a prepared statement, i want just this: do it > once, then forget about it. Though it could be nice to boost stupid > code > that doesn't know about prepared statements; which is not really a > problem of the driver. > >> Note: this work is based completely on apache's dbcp statement >> caching implementation and this will be noted in the final code. > > What does that mean? Nothing much, I am just giving credit where credit is due. I borrowed most of the concepts from them. Their license like ours allows this. Dave > > > Till >
Hmmm. You know, an interesting alternative way to get the same kind of performance boost: cache just the handles to the server-side prepared statements, but not the Java PreparedStatement instances. I'm just suggesting the idea because it seems like it would be simpler to implement (just a pool and some ref-counting), and you wouldn't need to worry about any special concurrency issues, because you'd be dealing with distinct PreparedStatement object that just happen to refer to the same server-side statement. It seems like it would be less error-prone to only cache the expensive part, and leave the rest of the stateful stuff in PreparedStatement alone. But on the other hand, I'm not in a position to offer an alternative implementation, and working code trumps vaporware any day :) -- Mark Lewis On Mon, 2006-06-19 at 12:55 -0400, Dave Cramer wrote: > This is just proof of concept. More work has to be done to make it > build properly and work properly under different jdk's > > Couple of questions. > > 1) What to do if there are multiple concurrent requests per > connection for the same statement? > 1) we could just allow it > 2) we could return a non-cacheable preparedstatement > 3) throw an exception > > 2) Is it enough to cache prepared statements or should we cache > statements too? > > Note: this work is based completely on apache's dbcp statement > caching implementation and this will be noted in the final code. > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster
Dave Cramer wrote: > Other drivers use statement caching namely Oracle. Are there any special things you have to do or care about to use cached statements there? My experience with the oracle driver is limited. >>> 1) What to do if there are multiple concurrent requests per >>> connection for the same statement? >>> 1) we could just allow it >>> 2) we could return a non-cacheable preparedstatement >>> 3) throw an exception >> Should do whatever a normal prepared statement would do in this >> situation, imho. That is, if i have code that uses normal prepared >> statements, and is updated to use these cached statements, it >> should not >> fail in any new ways. Or either the different behaviour it must be >> very >> well documented. > > I'd agree the driver should not fail. The current situation allows to use a prepared statement from several threads at once, or more than once before reading the result in a single thread. There is no way to get all the results back reliably in this case. Since the JDBC specifications don't demand anything more (afaik), solution 1 seems ok. A slightly more advanced solution might be to create a map of lists of prepared statements, instead of a map of prepared statements. If i have a prepared statement "SELECT 1" and this is in high demand by hundreds of threads (silly example), it would be possible to have more than one prepared statement with the same sql. That would be more like solution 2, with better performance in concurrent applications. I like Mark Lewis idea of caching the handles, but i have no real knowledge about the postgres api, the overhead of creating a new prepared statement object (could easily offset the saved memory in a "heavy duty" application) and how well this would work; or how this would influence the specific problem of multiple concurrent requests. Certainly sounds a bit more difficult to implement. How does it combine with methods like get*MetaData()? And what happens with methods like setQueryTimeout()? They change the behaviour of the cached statement, and don't get reset between use. Especially interesting in the case of automatic caching of prepared statements, where the user expects a new, "clean" statement. How does the Oracle driver handle this, for example? Ok, just looked at the source again. The method clearWarnings() is called before returning the statement. There could be an extended clearCachedStatement() method, or something like this to reset everything neccessary. Till
till toenges wrote: > The current situation allows to use a prepared statement from several > threads at once, or more than once before reading the result in a single > thread. There is no way to get all the results back reliably in this > case. Since the JDBC specifications don't demand anything more (afaik), > solution 1 seems ok. Oversight on my part: If the caching is automatic, an application that originally used several prepared statements with the same sql and now gets only one from the cache would very probably misbehave. Therefore "just allowing it" is not an option. In fact, i have no precise idea how a statement would know that it is ready for reuse. An application could keep a reference to the statement and run it again and again. WeakReferences and ReferenceQueues could be used. Always interesting to play with the garbage collector ;-) Maybe the postgres api has a way to identify individual calls and then Mark Lewis idea of just caching the handle and creating individual prepared statement objects upon use is the right way to go. Till
Dave Cramer wrote: > This is just proof of concept. More work has to be done to make it build > properly and work properly under different jdk's Isn't there a bunch of statement state (things like fetch size, max rows, etc) that have defined defaults and this cache implementation will not provide? The "wrapper" implementation approach suffers from the usual difficulty that the "back links" such as ResultSet.getStatement() point to the wrong object. It's actually quite a bit of work to get this right.. The cached statements are vulnerable to buggy apps that mutate the statement after close, since there's no interception of those methods to check whether the wrapper statement has been closed. What exactly is the performance bottleneck you're trying to avoid by having the statement pool? If it's the parse/plan cost, I think Mark's suggestion of putting the cache at the protocol level may be simpler. If it's overall statement cost, you might be better off with a generic wrapper that is not postgresql-specific at all. -O
On 19-Jun-06, at 7:15 PM, Oliver Jowett wrote: > Dave Cramer wrote: >> This is just proof of concept. More work has to be done to make it >> build properly and work properly under different jdk's > > Isn't there a bunch of statement state (things like fetch size, max > rows, etc) that have defined defaults and this cache implementation > will not provide? Good point, these would all have to be reset to their defaults as well. > > The "wrapper" implementation approach suffers from the usual > difficulty that the "back links" such as ResultSet.getStatement() > point to the wrong object. It's actually quite a bit of work to get > this right.. Since result sets only live as long as the statement, wouldn't they point to the statement that is still open ? > > The cached statements are vulnerable to buggy apps that mutate the > statement after close, since there's no interception of those > methods to check whether the wrapper statement has been closed. No question, and I would certainly not make this the default behaviour. The user would have to turn on caching. > > What exactly is the performance bottleneck you're trying to avoid > by having the statement pool? If it's the parse/plan cost, I think > Mark's suggestion of putting the cache at the protocol level may be > simpler. If it's overall statement cost, you might be better off > with a generic wrapper that is not postgresql-specific at all. How does the generic wrapper solve the problems above ? I would think they all suffer from the same problems ? > > -O >
On 19-Jun-06, at 6:53 PM, till toenges wrote: > Dave Cramer wrote: >> Other drivers use statement caching namely Oracle. > > Are there any special things you have to do or care about to use > cached > statements there? My experience with the oracle driver is limited. > >>>> 1) What to do if there are multiple concurrent requests per >>>> connection for the same statement? >>>> 1) we could just allow it >>>> 2) we could return a non-cacheable preparedstatement >>>> 3) throw an exception >>> Should do whatever a normal prepared statement would do in this >>> situation, imho. That is, if i have code that uses normal prepared >>> statements, and is updated to use these cached statements, it >>> should not >>> fail in any new ways. Or either the different behaviour it must be >>> very >>> well documented. >> >> I'd agree the driver should not fail. > > The current situation allows to use a prepared statement from several > threads at once, or more than once before reading the result in a > single > thread. There is no way to get all the results back reliably in this > case. Since the JDBC specifications don't demand anything more > (afaik), > solution 1 seems ok. AFAIK Postgres doesn't allow concurrent statements on the same connection, so I don't see how using multiple threads would work ? > > A slightly more advanced solution might be to create a map of lists of > prepared statements, instead of a map of prepared statements. If i > have > a prepared statement "SELECT 1" and this is in high demand by hundreds > of threads (silly example), it would be possible to have more than one > prepared statement with the same sql. That would be more like solution > 2, with better performance in concurrent applications. > > I like Mark Lewis idea of caching the handles, but i have no real > knowledge about the postgres api, the overhead of creating a new > prepared statement object (could easily offset the saved memory in a > "heavy duty" application) and how well this would work; or how this > would influence the specific problem of multiple concurrent requests. > Certainly sounds a bit more difficult to implement. > > How does it combine with methods like get*MetaData()? > > And what happens with methods like setQueryTimeout()? They change the > behaviour of the cached statement, and don't get reset between use. good point, however query timeout isn't used right now . > Especially interesting in the case of automatic caching of prepared > statements, where the user expects a new, "clean" statement. How does > the Oracle driver handle this, for example? > > Ok, just looked at the source again. The method clearWarnings() is > called before returning the statement. There could be an extended > clearCachedStatement() method, or something like this to reset > everything neccessary. > > > Till > > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
On Mon, 2006-06-19 at 19:34 -0400, Dave Cramer wrote: > How does the generic wrapper solve the problems above ? I would think > they all suffer from the same problems ? All of the prepared-statement caching mechanisms I've been exposed to solve the problem by giving you a non-shared object that wraps the real PreparedStatement, and ensuring that once the wrapper is closed, you can no longer access the underlying statement. -- Mark Lewis
Dave Cramer wrote: > AFAIK Postgres doesn't allow concurrent statements on the same > connection, so I don't see how using multiple threads would work ? The statements wouldn't need to be concurrent in the sense of happening at the same time: // stupid query String sql = "SELECT * FROM quicklychangingtable"; PreparedStatement p1 = connection.prepareStatement(sql); PreparedStatement p2 = connection.prepareStatement(sql); ResultSet r1 = p1.executeQuery(); // do something else for a while ResultSet r2 = p2.executeQuery(); What happens with r1? Might be ok, but i'm not sure if this is always true. Till
Till, Thanks, this is a great example. So how would any caching implementation deal with this? I can think of copying the result set into the wrapper, but that could be prohibitively expensive. Dave On 19-Jun-06, at 7:56 PM, till toenges wrote: > Dave Cramer wrote: >> AFAIK Postgres doesn't allow concurrent statements on the same >> connection, so I don't see how using multiple threads would work ? > > The statements wouldn't need to be concurrent in the sense of > happening > at the same time: > > // stupid query > String sql = "SELECT * FROM quicklychangingtable"; > PreparedStatement p1 = connection.prepareStatement(sql); > PreparedStatement p2 = connection.prepareStatement(sql); > ResultSet r1 = p1.executeQuery(); > // do something else for a while > ResultSet r2 = p2.executeQuery(); > > What happens with r1? Might be ok, but i'm not sure if this is > always true. > > > Till >
Dave Cramer wrote: > No question, and I would certainly not make this the default > behaviour. The user would have to turn on caching. In this case the user could be expected to close() the statement after use and not do anything with it afterwards. Therefore you can forget my part about not knowing when it is safe to reuse the statement. > How does the generic wrapper solve the problems above ? I would think > they all suffer from the same problems ? A generic wrapper would be explicitly used. The user would know how to use it and comply with any special rules. Implicit caching has to work with each and every existing app out there, no matter how braindead. Much more difficult. The middle road of optional, user configurable caching in the driver can expect a little more from the user, like properly calling close() and not doing anything with the statement afterwards. This would have to be explictly stated in the documentation. Till
Dave Cramer wrote: > > On 19-Jun-06, at 7:15 PM, Oliver Jowett wrote: >> >> The "wrapper" implementation approach suffers from the usual >> difficulty that the "back links" such as ResultSet.getStatement() >> point to the wrong object. It's actually quite a bit of work to get >> this right.. > Since result sets only live as long as the statement, wouldn't they > point to the statement that is still open ? What I mean is that this won't work: PreparedStatement ps = conn.prepareStatement(...); ResultSet rs = ps.executeQuery(); assert rs.getStatement() == ps; since rs.getStatement() will return the real underlying statement, not the wrapper that prepareStatement() creates. >> The cached statements are vulnerable to buggy apps that mutate the >> statement after close, since there's no interception of those methods >> to check whether the wrapper statement has been closed. > > No question, and I would certainly not make this the default behaviour. > The user would have to turn on caching. Isn't the right solution to intercept methods and complain if the wrapper is closed? The wrapper statement would never get re-opened like the underlying statement does. Then the cached version behaves just like the non-cached version. >> What exactly is the performance bottleneck you're trying to avoid by >> having the statement pool? If it's the parse/plan cost, I think Mark's >> suggestion of putting the cache at the protocol level may be simpler. >> If it's overall statement cost, you might be better off with a generic >> wrapper that is not postgresql-specific at all. > > How does the generic wrapper solve the problems above ? I would think > they all suffer from the same problems ? Well, yes, but my point is that you can solve this once for all JDBC drivers, you don't need postgres-specific code to do it .. and surely someone has already done this? -O
Oliver Jowett wrote: > Dave Cramer wrote: >> >> On 19-Jun-06, at 7:15 PM, Oliver Jowett wrote: >>> >>> The "wrapper" implementation approach suffers from the usual >>> difficulty that the "back links" such as ResultSet.getStatement() >>> point to the wrong object. It's actually quite a bit of work to get >>> this right.. >> Since result sets only live as long as the statement, wouldn't they >> point to the statement that is still open ? > > What I mean is that this won't work: > > PreparedStatement ps = conn.prepareStatement(...); > ResultSet rs = ps.executeQuery(); > assert rs.getStatement() == ps; > > since rs.getStatement() will return the real underlying statement, not > the wrapper that prepareStatement() creates. Oh, never mind, I didn't look at the code closely enough, you're returning the real statement not a wrapper. (that has some other problems with badly-behaved code as others have noted) -O
On Tue, 20 Jun 2006, Oliver Jowett wrote: > Dave Cramer wrote: > >> How does the generic wrapper solve the problems above ? I would think they >> all suffer from the same problems ? > > Well, yes, but my point is that you can solve this once for all JDBC drivers, > you don't need postgres-specific code to do it .. and surely someone has > already done this? Sure. All major J2EE app servers have a prepared statement cache. And then there's at least Apache commons dbcp that inspired Dave in the first place. - Heikki
In June I posted this. Since then we have some real results so I'd like to solicit ideas from the group about implementing statement caching. First and foremost it would be off by default and have to be explicitly turned on. After that I'm wondering about parameters which would affect resources. Would we want connection/database/driver connection limits ? Do we do this by number of prepared statements ? Overall memory consumption ? Then there is aging of cached statements; any ideas on how to age them? Will they need to be invalidated after so much time ? Any and all ideas are welcome. I will post definitive results shortly. I've been promised them for a few days now. Dave On 19-Jun-06, at 12:55 PM, Dave Cramer wrote: > This is just proof of concept. More work has to be done to make it > build properly and work properly under different jdk's > > Couple of questions. > > 1) What to do if there are multiple concurrent requests per > connection for the same statement? > 1) we could just allow it > 2) we could return a non-cacheable preparedstatement > 3) throw an exception > > 2) Is it enough to cache prepared statements or should we cache > statements too? > > Note: this work is based completely on apache's dbcp statement > caching implementation and this will be noted in the final code. > > <statementcache.diff> > <StatementPool.java> > <CacheConnection.java> > <CacheablePreparedStatement.java> > > ---------------------------(end of > broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster
Statement caching results. I have some numbers although I can only share relative numbers. Statement caching provides a 40% increase in throughput. So I think this is worth doing. Dave On 1-Mar-07, at 8:59 PM, Dave Cramer wrote: > In June I posted this. > > Since then we have some real results so I'd like to solicit ideas > from the group about implementing statement caching. > > First and foremost it would be off by default and have to be > explicitly turned on. After that I'm wondering about parameters > which would affect resources. Would we want connection/database/ > driver connection limits ? Do we do this by number of prepared > statements ? Overall memory consumption ? Then there is aging of > cached statements; any ideas on how to age them? Will they need to > be invalidated after so much time ? > > > Any and all ideas are welcome. > > I will post definitive results shortly. I've been promised them for > a few days now. > > Dave > > On 19-Jun-06, at 12:55 PM, Dave Cramer wrote: > >> This is just proof of concept. More work has to be done to make it >> build properly and work properly under different jdk's >> >> Couple of questions. >> >> 1) What to do if there are multiple concurrent requests per >> connection for the same statement? >> 1) we could just allow it >> 2) we could return a non-cacheable preparedstatement >> 3) throw an exception >> >> 2) Is it enough to cache prepared statements or should we cache >> statements too? >> >> Note: this work is based completely on apache's dbcp statement >> caching implementation and this will be noted in the final code. >> >> <statementcache.diff> >> <StatementPool.java> >> <CacheConnection.java> >> <CacheablePreparedStatement.java> >> >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 2: Don't 'kill -9' the postmaster > >
On Mar 1, 2007, at 5:59 PM, Dave Cramer wrote: > Since then we have some real results so I'd like to solicit ideas > from the group about implementing statement caching. > > First and foremost it would be off by default and have to be > explicitly turned on. After that I'm wondering about parameters > which would affect resources. Would we want connection/database/ > driver connection limits ? Do we do this by number of prepared > statements ? Overall memory consumption ? Then there is aging of > cached statements; any ideas on how to age them? Will they need to > be invalidated after so much time ? Are the patch files from June still fully applicable to the latest trunk code? (lazy question for sure, I could just apply-and-test m'self). I think caching policies could be fully separate from the implementation? I mean, in most applications that need this for performance reasons should have some sort of caching infrastructure already, and being able to plug into that would be great. If there was a way to set a 'StatementCache' implementation onto the DataSource impl, that'd be ideal. (Yes, this is me half-volunteering). Then, we can make up some default policies, but ultimately anything count/memory consumption is up to the users? We'd need to get some people running with the code to be able to make some decisions about sensible defaults. But anyone using pgsql should be familiar with tuning numbers for optimal performance anyways :) -pet -- (peter.royal|osi)@pobox.com - http://fotap.org/~osi