Thread: statement caching patch from Laszlo Hornyak for review
Laszlo Hornyak has finished off the proof of concept patch that I submitted a few months ago. This driver implements prepared statement caching and was used by Sun to achieve the excellent results for their benchmark. Statement caching has proven to provide a 30-40% improvement in the benchmark results. Because the files are quite large I've placed the patch at http:// beaver.fastcrypt.com/~davec/cachable.tar.gz Currently it does not compile under 1.6, but works fine under 1.5. Dave
Dave Cramer wrote: > Laszlo Hornyak has finished off the proof of concept patch that I > submitted a few months ago. > > This driver implements prepared statement caching and was used by Sun to > achieve the excellent results for their benchmark. > > Statement caching has proven to provide a 30-40% improvement in the > benchmark results. FWIW, I'm still of the opinion that this doesn't belong in the JDBC driver. Is there anything in there that depends on PostgreSQL? You could just implement it as a generic wrapper, right? Don't get me wrong, I know it really can give a big boost in performance, and it's definitely worth having if your application server doesn't do statement caching already. I'd just prefer to keep the driver slim, and implement additional features as external modules. In fact, how about kicking our connection pool implementation out to an external module as well? The connection pool and the statement cache could live together as a pgfoundry project, or as an additional jar in the jdbc project. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki, > In fact, how about kicking our connection pool implementation out to an > external module as well? The connection pool and the statement cache > could live together as a pgfoundry project, or as an additional jar in > the jdbc project. The postgres ConnectionPool implementation included in the JDBC project already has some postgres specific bits (e.g. setPreparedThreashold), and I find it is not specific enough, as it is quite hard to set postgres specific settings on the pooled connections otherwise, as the place where the pool is configured is the natural place where all other default connection properties should be configured too. So my opinion is to make the ConnectionPool implementation as specific for postgres as it is reasonable, allowing to set most of the things you could set on a postgres connection. In particular it would be nice to set the default properties to pass on when creating a new pooled connection (this is actually not really postgres specific now that I think about it, but prepareThreshold is). Cheers, Csaba.
Csaba Nagy wrote: >> In fact, how about kicking our connection pool implementation out to an >> external module as well? The connection pool and the statement cache >> could live together as a pgfoundry project, or as an additional jar in >> the jdbc project. > > The postgres ConnectionPool implementation included in the JDBC project > already has some postgres specific bits (e.g. setPreparedThreashold), > and I find it is not specific enough, as it is quite hard to set > postgres specific settings on the pooled connections otherwise, as the > place where the pool is configured is the natural place where all other > default connection properties should be configured too. So my opinion is > to make the ConnectionPool implementation as specific for postgres as it > is reasonable, allowing to set most of the things you could set on a > postgres connection. In particular it would be nice to set the default > properties to pass on when creating a new pooled connection (this is > actually not really postgres specific now that I think about it, but > prepareThreshold is). You could have extra methods to set any driver-specific properties in the generic connection pool implementation. With reflection, I think you could also create a proxy DataSource class that implements all the same driver-specific methods as the underlaying DataSource. What do you think of the idea of just moving the current implementation to an additional jar? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki, I posted the proof of concept back in June of last year, and again in March. I searched the archives but was unable to find where you voiced this opinion ? Either way, the patch is written in such a way to be very non- invasive to the driver, and the code will only ever be executed (except for a single if statement) if the user enables the feature. Can you be more specific about why you want this as a separate module ? Dave On 1-Aug-07, at 4:31 PM, Heikki Linnakangas wrote: > Csaba Nagy wrote: >>> In fact, how about kicking our connection pool implementation out >>> to an >>> external module as well? The connection pool and the statement cache >>> could live together as a pgfoundry project, or as an additional >>> jar in >>> the jdbc project. >> >> The postgres ConnectionPool implementation included in the JDBC >> project >> already has some postgres specific bits (e.g. setPreparedThreashold), >> and I find it is not specific enough, as it is quite hard to set >> postgres specific settings on the pooled connections otherwise, as >> the >> place where the pool is configured is the natural place where all >> other >> default connection properties should be configured too. So my >> opinion is >> to make the ConnectionPool implementation as specific for postgres >> as it >> is reasonable, allowing to set most of the things you could set on a >> postgres connection. In particular it would be nice to set the >> default >> properties to pass on when creating a new pooled connection (this is >> actually not really postgres specific now that I think about it, but >> prepareThreshold is). > > You could have extra methods to set any driver-specific properties in > the generic connection pool implementation. With reflection, I > think you > could also create a proxy DataSource class that implements all the > same > driver-specific methods as the underlaying DataSource. > > What do you think of the idea of just moving the current > implementation > to an additional jar? > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: explain analyze is your friend
Dave Cramer wrote: > I posted the proof of concept back in June of last year, and again in > March. > > I searched the archives but was unable to find where you voiced this > opinion ? It was as a reply to your post last year: http://archives.postgresql.org/pgsql-jdbc/2006-06/msg00049.php Maybe I should've voiced my opinion more strongly back then.. > Either way, the patch is written in such a way to be very non-invasive > to the driver, and the code > will only ever be executed (except for a single if statement) if the > user enables the feature. > > Can you be more specific about why you want this as a separate module ? I'd just like to keep the core JDBC driver slim as a matter of principle. Given that most application servers already have their own connection pooling and prepared statement caching implementation, and the fact that there's other stand-alone implementations out there (Apache DBCP, for example), most people who need the JDBC driver don't need another connection pool and statement cache. As a separate module, the pool and the cache could get a wider audience. You could use it with different databases and JDBC drivers in addition to PostgreSQL. And it wouldn't be tied to PostgreSQL driver release cycle, and vice versa. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 2-Aug-07, at 3:50 AM, Heikki Linnakangas wrote: > Dave Cramer wrote: >> I posted the proof of concept back in June of last year, and again in >> March. >> >> I searched the archives but was unable to find where you voiced this >> opinion ? > > It was as a reply to your post last year: > > http://archives.postgresql.org/pgsql-jdbc/2006-06/msg00049.php > > Maybe I should've voiced my opinion more strongly back then.. Yes, that would have been helpful. So can we now turn our attention to the technical merits of the patch ? > >> Either way, the patch is written in such a way to be very non- >> invasive >> to the driver, and the code >> will only ever be executed (except for a single if statement) if the >> user enables the feature. >> >> Can you be more specific about why you want this as a separate >> module ? > > I'd just like to keep the core JDBC driver slim as a matter of > principle. Given that most application servers already have their own > connection pooling and prepared statement caching implementation, and > the fact that there's other stand-alone implementations out there > (Apache DBCP, for example), most people who need the JDBC driver don't > need another connection pool and statement cache. > Well, there is one application server which does not (Sun's). > As a separate module, the pool and the cache could get a wider > audience. > You could use it with different databases and JDBC drivers in addition > to PostgreSQL. And it wouldn't be tied to PostgreSQL driver release > cycle, and vice versa. I'd argue why bother at all since DBCP exists. Dave > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com
Dave Cramer wrote: > Laszlo Hornyak has finished off the proof of concept patch that I > submitted a few months ago. Any reason you wrote your own scan-the-whole-pool LRU rather than using something like LinkedHashMap? I didn't dig into the code too closely but it looks like you are using the statement object directly with no wrapper. Doesn't this run the risk that you will resurrect a previously-closed statement? Normal statement objects have a one-way lifecycle, once they are closed they cannot be resurrected, if app clients have a reference to the real statement then potentially they'll see different behaviour when the statement starts getting reused. That smells dangerous; not because any sane application will rely on it, but because it will be a source of very hard to find bugs. (e.g. it's fairly common and harmless to close an already-closed statement.. but that's suddenly disastrous if the statement has actually been pooled & reused in the meantime) Are all the abstract/interface/concrete-implementation classes strictly necessary to get it operating under multiple JDBC versions? They are a real pain when it comes to maintaining the code. For example I can't see offhand why you need to do that with StatementPool. -O
On Thu, 2 Aug 2007, Oliver Jowett wrote: > I didn't dig into the code too closely but it looks like you are using the > statement object directly with no wrapper. Doesn't this run the risk that you > will resurrect a previously-closed statement? Normal statement objects have a > one-way lifecycle, once they are closed they cannot be resurrected, if app > clients have a reference to the real statement then potentially they'll see > different behaviour when the statement starts getting reused. That smells > dangerous; not because any sane application will rely on it, but because it > will be a source of very hard to find bugs. (e.g. it's fairly common and > harmless to close an already-closed statement.. but that's suddenly > disastrous if the statement has actually been pooled & reused in the > meantime) > This is the fundamental objection. Calling close multiple times is perfectly legal and is not supported by this implementation. I have some additional notes based on my reading of the code that are rather secondary to the above: 1) Why does PStmtKey default holdability to HOLD_CURSORS_OVER_COMMIT when the driver defaults to close at commit? 2) What is the point of getNumActive/Idle in AbstractJdbc3StatementPool? 3) As you note it doesn't build with JDK1.6, but it also doesn't build with JDK1.2 or 1.3. While statement pooling is a JDBC3 feature the other driver versions must still build. 4) The test suite you've provided fails for me with: junit.framework.AssertionFailedError at org.postgresql.test.jdbc3.Jdbc3CacheableStatementTest.testStatementCacheBehaviour(Jdbc3CacheableStatementTest.java:104) 5) Defaulting the cache size to unlimited seems unwise. 6) You can't add tests to jdbc2/optional that require JDBC3 functionality. 7) What's the deal with caching CallableStatements? It looks half finished and should either be removed or implemented. 8) Jdbc3CacheablePreparedStatement references Jdbc3ResultSet, but doesn't it need to refer to Jdbc3gResultSet if building with JDK1.5? I don't understand how this compiles, but it does. Kris Jurka
Kris, Oliver, Laszlo is working on all of your recommendations, thanks for taking the time to review this. DAve On 2-Aug-07, at 1:02 PM, Kris Jurka wrote: > > > On Thu, 2 Aug 2007, Oliver Jowett wrote: > >> I didn't dig into the code too closely but it looks like you are >> using the statement object directly with no wrapper. Doesn't this >> run the risk that you will resurrect a previously-closed >> statement? Normal statement objects have a one-way lifecycle, once >> they are closed they cannot be resurrected, if app clients have a >> reference to the real statement then potentially they'll see >> different behaviour when the statement starts getting reused. That >> smells dangerous; not because any sane application will rely on >> it, but because it will be a source of very hard to find bugs. >> (e.g. it's fairly common and harmless to close an already-closed >> statement.. but that's suddenly disastrous if the statement has >> actually been pooled & reused in the meantime) >> > > This is the fundamental objection. Calling close multiple times is > perfectly legal and is not supported by this implementation. I > have some additional notes based on my reading of the code that are > rather secondary to the above: > > 1) Why does PStmtKey default holdability to > HOLD_CURSORS_OVER_COMMIT when the driver defaults to close at commit? > > 2) What is the point of getNumActive/Idle in > AbstractJdbc3StatementPool? > > 3) As you note it doesn't build with JDK1.6, but it also doesn't > build with JDK1.2 or 1.3. While statement pooling is a JDBC3 > feature the other driver versions must still build. > > 4) The test suite you've provided fails for me with: > > junit.framework.AssertionFailedError > at > org.postgresql.test.jdbc3.Jdbc3CacheableStatementTest.testStatementCac > heBehaviour(Jdbc3CacheableStatementTest.java:104) > > 5) Defaulting the cache size to unlimited seems unwise. > > 6) You can't add tests to jdbc2/optional that require JDBC3 > functionality. > > 7) What's the deal with caching CallableStatements? It looks half > finished and should either be removed or implemented. > > 8) Jdbc3CacheablePreparedStatement references Jdbc3ResultSet, but > doesn't it need to refer to Jdbc3gResultSet if building with > JDK1.5? I don't understand how this compiles, but it does. > > Kris Jurka > > ---------------------------(end of > broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match
Dave Cramer wrote: > On 2-Aug-07, at 3:50 AM, Heikki Linnakangas wrote: >> As a separate module, the pool and the cache could get a wider audience. >> You could use it with different databases and JDBC drivers in addition >> to PostgreSQL. And it wouldn't be tied to PostgreSQL driver release >> cycle, and vice versa. > I'd argue why bother at all since DBCP exists. Ok, I'll bite. Why bother at all since DBCP exists? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Dave Cramer wrote: > Kris, Oliver, > > Laszlo is working on all of your recommendations, thanks for taking the > time to review this. Before he throws lots more work at this -- if you are adding a wrapping statement object (removing the dependency on the PG driver's statement implementation), is there anything else which that ties the implementation to the postgresql driver code? Seems to me that to get correct behaviour you need to wrap Statement which also means you have to wrap Connection and so on .. at which point the question is, does this need to be integrated with the driver at all or can it be implemented as a separate module? -O
Hi!
I think, we can say that such external wrappers already exist, e.g. is commons-dbcp with its DataSource implementation.
If you wish to use the JDBC drivers datasource, or don't wat to use DataSource, only a JDBC Connection, is there any other choice than writing code to the JDBC driver code?
--
<signature>
<name>
<firstName>László</firstName>
<lastName>Hornyák</lastName>
</name>
</signature>
I think, we can say that such external wrappers already exist, e.g. is commons-dbcp with its DataSource implementation.
If you wish to use the JDBC drivers datasource, or don't wat to use DataSource, only a JDBC Connection, is there any other choice than writing code to the JDBC driver code?
On 8/4/07, Oliver Jowett <oliver@opencloud.com> wrote:
Dave Cramer wrote:
> Kris, Oliver,
>
> Laszlo is working on all of your recommendations, thanks for taking the
> time to review this.
Before he throws lots more work at this -- if you are adding a wrapping
statement object (removing the dependency on the PG driver's statement
implementation), is there anything else which that ties the
implementation to the postgresql driver code?
Seems to me that to get correct behaviour you need to wrap Statement
which also means you have to wrap Connection and so on .. at which point
the question is, does this need to be integrated with the driver at all
or can it be implemented as a separate module?
-O
--
<signature>
<name>
<firstName>László</firstName>
<lastName>Hornyák</lastName>
</name>
</signature>
Hi, I am quite new in the PostgreSQL arena. However I know there are drivers that do implement SC (Oracle, mySQL, third party drivers for DB2). I believe the application servers have this build in out of pure necessity: when they needed it there simply were no drivers available that offered this. Therefore the motivation "most of them have it, so why should we offer it" sounds not too motivational to me. I believe that since a statement is part of a connection the proper place for a pool is within that connection (to me at least from an architectural/design point of view). I feel any self respecting driver should offer SC. Of course in a way that can also be used by applications that have no need for it. Either because they know each/ most statements are a one time thing or for other reasons. Of-course in a safe way, although I am not sure a wrapping object is the only way to ensure this. This however is an discussion of implementation. Why do I dare to make this bold statement? First there are many more applications out there that use a database but are not J2EE. Therefore these would not benefit from SC build into application servers. Second all the ISVs I have been working with (the majority in the finance, erp, crm, telco segments) use a (transactional)database. Some of them are using non prepared statements. Most of them are using or planning to use prepared statements. All these ISVs use Oracle and the Oracle JDBC driver (I only did meet one that also offered a home made Oracle driver). They all use the SC feature. Third these ISVs have so much work to do in their build,test,release cycle that they will not easily accept another database. Let alone a database that mandates the use of yet another element (the SC wrapper) to get a feature they do need. Fourth, the code they need to maintain is already so large that they do not want to implement a SC feature themselves. Hopefully these practical reasons are also taken into consideration in the discussion if SC needs to be(come) part of the core. Regards, Paul ------------------------------------------------------------------------ --------------------- Paul van den Bogaard Paul.vandenBogaard@sun.com CIE -- Collaboration and ISV Engineering, Opensource Engineering group Sun Microsystems, Inc phone: +31 334 515 918 Saturnus 1 extentsion: x (70)15918 3824 ME Amersfoort mobile: +31 651 913 354 The Netherlands fax: +31 334 515 001
Hi!
A short summary of the wrapper/patch-the-core question, feel free to extend it.
Advantages of the prepared statement caching feature coded into the JDBC Driver:
- Better performance for preparedstatements if the connection is established with DriverManager or the pg DataSource implementation. I will post some benchmark result...
- Easy to configure: the pg jdbc jar on the classpath and some parameters appended to the URL
Disadvantages:
- Lot of code compared to relatively few users (still a lot imo)
Advantages of a wrapping solution:
- Completely external project, helps to keep the driver codebase small
- Does not add much overhead on the client side, still good performance
- Works with other JDBC Drivers (not much value if most JDBC Driver does this feature)
Disadvantages:
- Somewhat more work to setup (still worth the little work imo): pg jdbc jar file plus wrapper jar file and pg JDBC URL prefixed by the wrapper URL and parameters (e.g. jdbc:wrapper[preparedStatementPoolSize=64;otherParam=blabla]:postgresql://localhost/DB, feel free to suggest a more friendly look)
- As external project, it is just one more thing for the programmers to keep an eye on
Now, with the input from the mailing list, a wrapper solution is under development, since I do not have to work with a lot of existing code in this case, you can expect it soon.
Your input is welcome!
--
László Hornyák
A short summary of the wrapper/patch-the-core question, feel free to extend it.
Advantages of the prepared statement caching feature coded into the JDBC Driver:
- Better performance for preparedstatements if the connection is established with DriverManager or the pg DataSource implementation. I will post some benchmark result...
- Easy to configure: the pg jdbc jar on the classpath and some parameters appended to the URL
Disadvantages:
- Lot of code compared to relatively few users (still a lot imo)
Advantages of a wrapping solution:
- Completely external project, helps to keep the driver codebase small
- Does not add much overhead on the client side, still good performance
- Works with other JDBC Drivers (not much value if most JDBC Driver does this feature)
Disadvantages:
- Somewhat more work to setup (still worth the little work imo): pg jdbc jar file plus wrapper jar file and pg JDBC URL prefixed by the wrapper URL and parameters (e.g. jdbc:wrapper[preparedStatementPoolSize=64;otherParam=blabla]:postgresql://localhost/DB, feel free to suggest a more friendly look)
- As external project, it is just one more thing for the programmers to keep an eye on
Now, with the input from the mailing list, a wrapper solution is under development, since I do not have to work with a lot of existing code in this case, you can expect it soon.
Your input is welcome!
On 09/08/07, Paul van den Bogaard <Paul.Vandenbogaard@sun.com> wrote:
Hi,
I am quite new in the PostgreSQL arena. However I know there are
drivers that do implement SC (Oracle, mySQL, third party drivers for
DB2). I believe the application servers have this build in out of
pure necessity: when they needed it there simply were no drivers
available that offered this. Therefore the motivation "most of them
have it, so why should we offer it" sounds not too motivational to me.
I believe that since a statement is part of a connection the proper
place for a pool is within that connection (to me at least from an
architectural/design point of view). I feel any self respecting
driver should offer SC. Of course in a way that can also be used by
applications that have no need for it. Either because they know each/
most statements are a one time thing or for other reasons. Of-course
in a safe way, although I am not sure a wrapping object is the only
way to ensure this. This however is an discussion of implementation.
Why do I dare to make this bold statement?
First there are many more applications out there that use a database
but are not J2EE. Therefore these would not benefit from SC build
into application servers.
Second all the ISVs I have been working with (the majority in the
finance, erp, crm, telco segments) use a (transactional)database.
Some of them are using non prepared statements. Most of them are
using or planning to use prepared statements. All these ISVs use
Oracle and the Oracle JDBC driver (I only did meet one that also
offered a home made Oracle driver). They all use the SC feature.
Third these ISVs have so much work to do in their build,test,release
cycle that they will not easily accept another database. Let alone a
database that mandates the use of yet another element (the SC
wrapper) to get a feature they do need.
Fourth, the code they need to maintain is already so large that they
do not want to implement a SC feature themselves.
Hopefully these practical reasons are also taken into consideration
in the discussion if SC needs to be(come) part of the core.
Regards,
Paul
------------------------------------------------------------------------
---------------------
Paul van den Bogaard
Paul.vandenBogaard@sun.com
CIE -- Collaboration and ISV Engineering, Opensource Engineering group
Sun Microsystems, Inc phone: +31
334 515 918
Saturnus 1
extentsion: x (70)15918
3824 ME Amersfoort mobile: +31
651 913 354
The Netherlands
fax: +31 334 515 001
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
--
László Hornyák
One more disadvantage of the wrapper jdbc driver come to my mind: it hides the postgresql-specific features.
--
László Hornyák
--
László Hornyák
I just noticed this: Now that JDBC 4 has a notion of pooling built right into the statement interface it is clear (to me at least) that the implementation belongs in the driver. Heikki ? Dave On 10-Aug-07, at 6:09 AM, László Hornyák wrote: > One more disadvantage of the wrapper jdbc driver come to my mind: > it hides the postgresql-specific features. > > -- > László Hornyák
Dave Cramer wrote: > I just noticed this: > > Now that JDBC 4 has a notion of pooling built right into the statement > interface it is clear (to me at least) that the implementation belongs > in the driver. In chapter 11.6, "Reuse of Statements by Pooled Connection", the spec says: > In FIGURE 11-2, the connection pool and statement pool are implemented by the > application server. However, this functionality could also be implemented by the > driver or underlying data source. This discussion of statement pooling is meant to > allow for any of these implementations. The new methods added in JDBC4, setPoolable and isPoolable are for use by applications to hint the connection pool implementation which statements it should try to pool. Statements are poolable by default, so it's really for telling the pool not to bother pooling one off queries. I still don't think it's wise for us to bundle a statement cache in the driver. If we had a server-side statement cache, or if there was some other PostgreSQL specific trick we could take advantage of, it would make sense to provide an interface for it. But there isn't. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 30-Aug-07, at 4:14 AM, Heikki Linnakangas wrote: > Dave Cramer wrote: >> I just noticed this: >> >> Now that JDBC 4 has a notion of pooling built right into the >> statement >> interface it is clear (to me at least) that the implementation >> belongs >> in the driver. > > In chapter 11.6, "Reuse of Statements by Pooled Connection", the > spec says: > >> In FIGURE 11-2, the connection pool and statement pool are >> implemented by the >> application server. However, this functionality could also be >> implemented by the >> driver or underlying data source. This discussion of statement >> pooling is meant to >> allow for any of these implementations. > > The new methods added in JDBC4, setPoolable and isPoolable are for use > by applications to hint the connection pool implementation which > statements it should try to pool. Statements are poolable by > default, so > it's really for telling the pool not to bother pooling one off > queries. > > I still don't think it's wise for us to bundle a statement cache in > the > driver. If we had a server-side statement cache, or if there was some > other PostgreSQL specific trick we could take advantage of, it would > make sense to provide an interface for it. But there isn't. Well, that's yet to be discovered. We are in the process of making a wrapper driver which we will test for speed. If the same throughput can't be achieved due to the overhead of proxying the calls then there is a good argument for providing statement caching inside the driver. Dave > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com