Thread: Pipelining executions to postgresql server
Hi, I created a proof of concecpt patch for postgresql JDBC driver that allows the caller to do pipelining of requests withina transaction. The pipelining here means same as for HTTP: the client can send the next execution already before waitingfor the response of the previous request to be fully processed. The goal is to reduce the effects of latency between server and client. The pipelining allowed my test with localhost postgresqland jdbc test that queries a single value from database 200 times to get a more than 20% speed-up. The pipeliningversion processes the responses every 10 queries. With actual latency between the server and client largerspeed-ups are of course possible. I think pipelining + jsonb support would make postgresql even more competive key/value and document store. Example use case: 1) insert to shopping cart table, and 3 inserts shopping cart rows table in one pipeline. - only one round trip penalty instead of 4 2) query shopping cart row and query shopping cart rows in one pipeline - only one round trip penalty instead of 2 The only alternative way to reduce the round-trip latency is to make every operation in single round-trip and that can onlybe done with pl functions and by passing in more complex objects, for example the whole shopping cart with rows as jsondata. What kind of problems could pipelining cause (assuming we limit it to rather simple use cases only)? -Mikko
Attachment
Mikko Tiihonen <Mikko.Tiihonen@nitorcreations.com> writes: > I created a proof of concecpt patch for postgresql JDBC driver that allows the caller to do pipelining of requests withina transaction. The pipelining here means same as for HTTP: the client can send the next execution already before waitingfor the response of the previous request to be fully processed. In principle this can work if you think through the error handling carefully. The way the FE/BE protocol is intended to handle the case is that all the pipelined statements are part of one transaction so that they all succeed or fail together. Otherwise the user has to work through and predict the outcome of all the combinations of "Q1 succeeded, Q2 failed, Q3 succeeded" which is mighty error-prone, especially if the commands are interdependent in any way at all. Having legislated that (and forbidden any transaction-boundary commands in a pipelined group), what you do is not issue a Sync until after the last command of the set. Then, if any command fails, the backend just automatically discards remaining messages until it gets the Sync, and so you can safely issue new commands before knowing the results of the previous ones. I mention this because, although I don't know the innards of the JDBC driver at all, it sure doesn't look like that's the way you've done it. If I'm right that you've left the error recovery strategy to the user, I think this is gonna be very hard to use safely. regards, tom lane
On 2014-11-01 14:04:05 +0000, Mikko Tiihonen wrote: > I created a proof of concecpt patch for postgresql JDBC driver that > allows the caller to do pipelining of requests within a > transaction. The pipelining here means same as for HTTP: the client > can send the next execution already before waiting for the response of > the previous request to be fully processed. Slightly confused here. To my knowledge the jdbc driver already employs some pipelining? There's some conditions where it's disabled (IIRC RETURNING for DML is one of them), but otherwise it's available. I'm very far from a pgjdbc expert, but that's what I gathered from the code when investigating issues a while back and from my colleague Craig. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
> From: Andres Freund <andres@2ndquadrant.com> > On 2014-11-01 14:04:05 +0000, Mikko Tiihonen wrote: > > I created a proof of concecpt patch for postgresql JDBC driver that > > allows the caller to do pipelining of requests within a > > transaction. The pipelining here means same as for HTTP: the client > > can send the next execution already before waiting for the response of > > the previous request to be fully processed. > > Slightly confused here. To my knowledge the jdbc driver already employs > some pipelining? There's some conditions where it's disabled (IIRC > RETURNING for DML is one of them), but otherwise it's available. > > I'm very far from a pgjdbc expert, but that's what I gathered from the > code when investigating issues a while back and from my colleague Craig. Most DB interfaces make the server operations look synchronous. For JDBC the only standard interface is similar to libpg: PGresult *PQexec(PGconn *conn, const char *command); I should have searched earlier a better reference to libpg. I am planning on adding support for something similar to http://www.postgresql.org/docs/9.3/static/libpq-async.html More specifically operations like: int PQsendQuery(PGconn *conn, const char *command); PGresult *PQgetResult(PGconn *conn); The Java API will of course be custom to postgresql jdbc driver since there is no official java api for async db operations. The above I can do purely on the jdbc driver side. But my my question was about pipelining. In libpg terms: Is it safe to do multiple PQsendQuery operations before invoking PQgetResult as many times? It should be, if the server startd to process (read in) the next query only after it has sent the previous response out. And do any pg connection poolers support having multiple executions on-the-fly at the same time for the same connection? -Mikko
On 2014-11-02 12:11:49 +0000, Mikko Tiihonen wrote: > > From: Andres Freund <andres@2ndquadrant.com> > > On 2014-11-01 14:04:05 +0000, Mikko Tiihonen wrote: > > > I created a proof of concecpt patch for postgresql JDBC driver that > > > allows the caller to do pipelining of requests within a > > > transaction. The pipelining here means same as for HTTP: the client > > > can send the next execution already before waiting for the response of > > > the previous request to be fully processed. > > > > Slightly confused here. To my knowledge the jdbc driver already employs > > some pipelining? There's some conditions where it's disabled (IIRC > > RETURNING for DML is one of them), but otherwise it's available. > > > > I'm very far from a pgjdbc expert, but that's what I gathered from the > > code when investigating issues a while back and from my colleague Craig. > > Most DB interfaces make the server operations look synchronous. You IIRC can use jdbc's batch interface. > I should have searched earlier a better reference to libpg. I am planning on adding support for something similar to > http://www.postgresql.org/docs/9.3/static/libpq-async.html > More specifically operations like: > int PQsendQuery(PGconn *conn, const char *command); > PGresult *PQgetResult(PGconn *conn); That doesn't really support pipelining though - you can still only send one query. It's nonblocking, but that's a different thing. > In libpg terms: Is it safe to do multiple PQsendQuery operations > before invoking PQgetResult as many times? The network protocol allows for pipelining, yes. But libpq unfortunately doesn't. You should read the protocol definition. > It should be, if the server startd to process (read in) the next query > only after it has sent the previous response out. There's complexities around error handling and such making it slightly more complex. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
> > > From: Andres Freund <andres@2ndquadrant.com> > > > On 2014-11-01 14:04:05 +0000, Mikko Tiihonen wrote: > > > > I created a proof of concecpt patch for postgresql JDBC driver that > > > > allows the caller to do pipelining of requests within a > > > > transaction. The pipelining here means same as for HTTP: the client > > > > can send the next execution already before waiting for the response of > > > > the previous request to be fully processed. > > > > > > Slightly confused here. To my knowledge the jdbc driver already employs > > > some pipelining? There's some conditions where it's disabled (IIRC > > > RETURNING for DML is one of them), but otherwise it's available. > > > > > > I'm very far from a pgjdbc expert, but that's what I gathered from the > > > code when investigating issues a while back and from my colleague Craig. > > > > Most DB interfaces make the server operations look synchronous. > > You IIRC can use jdbc's batch interface. Yes, there is a limited batch interface for inserts and updates. But for example when using prepared statements you can onlydo batches of same statement (with different parameters of course). > > I should have searched earlier a better reference to libpg. I am planning on adding support for something similar to > > http://www.postgresql.org/docs/9.3/static/libpq-async.html > > More specifically operations like: > > int PQsendQuery(PGconn *conn, const char *command); > > PGresult *PQgetResult(PGconn *conn); > > The network protocol allows for pipelining, yes. But libpq unfortunately > doesn't. > You should read the protocol definition. I have studied the protocol, that is why I concluded that it would be possible to add support for pipelining for clients. > > It should be, if the server startd to process (read in) the next query > > only after it has sent the previous response out. > > There's complexities around error handling and such making it slightly > more complex. Are you referring to some complexities on the server side related to error handling or on the client side? Is the following summary correct: - the network protocol supports pipelinings - the server handles operations in order, starting the processing of next operation only after fully processing the previousone - thus pipelining is invisible to the server - libpq driver does not support pipelining, but that is due to internal limitations - if proper error handling is done by the client then there is no reason why pipelining could be supported by any pg client -Mikko
>>>> On 2014-11-01 14:04:05 +0000, Mikko Tiihonen wrote: >>>>> I created a proof of concecpt patch for postgresql JDBC driver that >>>>> allows the caller to do pipelining of requests within a >>>>> transaction. The pipelining here means same as for HTTP: the client >>>>> can send the next execution already before waiting for the response of >>>>> the previous request to be fully processed. >>>> >>>> Slightly confused here. To my knowledge the jdbc driver already employs >>>> some pipelining? There's some conditions where it's disabled (IIRC >>>> RETURNING for DML is one of them), but otherwise it's available. >>>> >>>> I'm very far from a pgjdbc expert, but that's what I gathered from the >>>> code when investigating issues a while back and from my colleague Craig. >>> >>> Most DB interfaces make the server operations look synchronous. >> >> You IIRC can use jdbc's batch interface. > > Yes, there is a limited batch interface for inserts and updates. But for > example when using prepared statements you can only do batches of same > statement (with different parameters of course). Hi Mikko, I am very interested in this. Indeed JDBC makes you wait for N round-trip delays. If my transaction needs to to INSERT rows to tables A and B and UPDATE rows in tables C and D (and then COMMIT), then I'm looking at 4-5 round trips, even with executeBatch which as you mentioned is made to "look synchronous". If DB is localhost it's OK, but gets painful if DB is across a LAN hop, and unusable across a WAN. As you've observed, there is no corresponding limitation in the FE/BE protocol, if we can delay sending Sync until after we've sent all the overlapping Binds and Executes. I looked over your patch. Your list of ResultHandlerHolders seems to be the right direction, but as Tom Lane mentioned there may need to be some way to ensure the statements are all in the same transaction. Off the top of my head I've sketched out an interface below (using PGStatement & PGConnection so we don't have to downcast quite as far). After the "sync" call you could call the original Statement.executeQuery but it would return immediately. If you attempted to re-use a Statement (or make additional addBatch calls) between the async() and sync() calls then you would get an IllegalStateException. This avoids the need for a Future, and avoids the client having to loop/sleep until done. /** * This interface defines the public PostgreSQL extensions to * java.sql.Statement. All Statements constructed by the PostgreSQL * driver implement PGStatement. */ public interface PGStatement { // ... /** Like {@link PreparedStatement#executeQuery()} but no results until {@link PGConnection#sync}. */ void asyncExecuteQuery(); /** Like {@link PreparedStatement#executeUpdate()} but no results until {@link PGConnection#sync}. */ void asyncExecuteUpdate(); /** Like {@link Statement#executeBatch()} but no results until {@link PGConnection#sync}. */ void asyncExecuteBatch(); } /** * This interface defines the public PostgreSQL extensions to * java.sql.Connection. All Connections returned by the PostgreSQL driver * implement PGConnection. */ public interface PGConnection { // ... /** Wait for all the asynchronous statements to complete. */ void sync(); /** Commit the current transaction, and wait for all the asynchronous statements to complete. */ void commitAndSync(); }
On 11/01/2014 10:04 PM, Mikko Tiihonen wrote: > Hi, > > I created a proof of concecpt patch for postgresql JDBC driver that allows the caller to do pipelining of requests withina transaction. The pipelining here means same as for HTTP: the client can send the next execution already before waitingfor the response of the previous request to be fully processed. ... but ... it already does that. Use batches, and that's exactly what it'll do. Statement.addBatch(String) or PreparedStatement.addBatch() > What kind of problems could pipelining cause (assuming we limit it to rather simple use cases only)? Client/server pipleline deadlocks due to how PgJDBC manages it. See the details: https://github.com/pgjdbc/pgjdbc/issues/195 and https://github.com/pgjdbc/pgjdbc/issues/194 -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/02/2014 09:27 PM, Mikko Tiihonen wrote: > Is the following summary correct: > - the network protocol supports pipelinings Yes. All you have to do is *not* send a Sync message and be aware that the server will discard all input until the next Sync, so pipelining + autocommit doesn't make a ton of sense for error handling reasons. > - the server handles operations in order, starting the processing of next operation only after fully processing the previousone - thus pipelining is invisible to the server As far as I know, yes. The server just doesn't care. > - libpq driver does not support pipelining, but that is due to internal limitations Yep. > - if proper error handling is done by the client then there is no reason why pipelining could be supported by any pg client Indeed, and most should support it. Sending batches of related queries would make things a LOT faster. PgJDBC's batch support is currently write-oriented. There is no fundamental reason it can't be expanded for reads. I've already written a patch to do just that for the case of returning generated keys. https://github.com/ringerc/pgjdbc/tree/batch-returning-support and just need to rebase it so I can send a pull for upstream PgJDBC. It's already linked in the issues documenting the limitatations in batch support. If you want to have more general support for batches that return rowsets there's no fundamental technical reason why it can't be added. It just requires some tedious refactoring of the driver to either: - Sync and wait before it fills its *send* buffer, rather than trying to manage its receive buffer (the server send buffer), so it can reliably avoid deadlocks; or - Do async I/O in a select()-like loop over a protocol state machine, so it can simultaneously read and write on the wire. I might need to do some of that myself soon, but it's a big (and therefore error-prone) job I've so far avoided by making smaller, more targeted changes. Doing async I/O using Java nio channels is by far the better approach, but also the more invasive one. The driver currently sends data on the wire where it generates it and blocks to receive expected data. Switching to send-side buffer management doesn't have the full performance gains that doing bidirectional I/O via channels does, though, and may be a significant performance _loss_ if you're sending big queries but getting small replies. For JDBC the JDBC batch interface is the right place to do this, and you should not IMO attempt to add pipelining outside that interface. (Multiple open resultsets from portals, yes, but not pipelining of queries). -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
> Craig Ringer wrote: > On 11/02/2014 09:27 PM, Mikko Tiihonen wrote: > > Is the following summary correct: > > - the network protocol supports pipelinings > Yes. > > All you have to do is *not* send a Sync message and be aware that the > server will discard all input until the next Sync, so pipelining + > autocommit doesn't make a ton of sense for error handling reasons. I do not quite grasp why not sending Sync is so important. My proof of concept setup was for queries with autocommit enabled. When looking with wireshark I could observe that the client sent 3-10 P/B//D/E/S messages to server, before the server startedsending the corresponding 1/2/T/D/C/Z replies for each request. Every 10 requests the test application waited forthe all the replies to come to not overflow the network buffers (which is known to cause deadlocks with current pg jdbcdriver). If I want separate error handling for each execution then shouldn't I be sending separate sync for each pipelined operation? > > - the server handles operations in order, starting the processing of next operation only after fully processing the previousone > > - thus pipelining is invisible to the server > > As far as I know, yes. The server just doesn't care. > > > - libpq driver does not support pipelining, but that is due to internal limitations > > Yep. > > > - if proper error handling is done by the client then there is no reason why pipelining could be supported by any pgclient > > Indeed, and most should support it. Sending batches of related queries > would make things a LOT faster. > > PgJDBC's batch support is currently write-oriented. There is no > fundamental reason it can't be expanded for reads. I've already written > a patch to do just that for the case of returning generated keys. > > https://github.com/ringerc/pgjdbc/tree/batch-returning-support > > and just need to rebase it so I can send a pull for upstream PgJDBC. > It's already linked in the issues documenting the limitatations in batch >support. Your code looked like good. Returning inserts are an important thing to optimize. > If you want to have more general support for batches that return rowsets > there's no fundamental technical reason why it can't be added. It just > requires some tedious refactoring of the driver to either: > > - Sync and wait before it fills its *send* buffer, rather than trying > to manage its receive buffer (the server send buffer), so it can > reliably avoid deadlocks; or > > - Do async I/O in a select()-like loop over a protocol state machine, > so it can simultaneously read and write on the wire. I also think the async I/O is the way to go. Luckily that has already been done in the pgjdbc-ng (https://github.com/impossibl/pgjdbc-ng), built on top of netty java NIO library. It has quite good feature parity with the original pgjdbc driver. I'll try next if I can enable the pipelining with it now that I have tried the proof of concept with the originial pgjdbc driver. > I might need to do some of that myself soon, but it's a big (and > therefore error-prone) job I've so far avoided by making smaller, more > targeted changes. > > For JDBC the JDBC batch interface is the right place to do this, and you > should not IMO attempt to add pipelining outside that interface. > (Multiple open resultsets from portals, yes, but not pipelining of queries). I do not think the JDBC batch interface even allow doing updates to multiple tables when using prepared statements? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Mikko Tiihonen <Mikko.Tiihonen@nitorcreations.com> writes: > I do not quite grasp why not sending Sync is so important. My proof of concept setup was for queries with autocommit enabled. The point is that that will be very, very much harder to use than doing it the other way. It's fairly easy to reason about the results of single-transaction pipelined queries: they're all or nothing. If you fire off queries that are going to autocommit independently, then you have to worry about all combinations of success/failure, and you won't have the opportunity to adjust on the fly. It'll be very much like sending a fixed (predetermined) SQL script to the server using a scripting language that lacks any conditionals. People certainly do use fixed scripts sometimes, but they usually find out they want them wrapped into single transactions, because otherwise they're left with a horrible mess if the script goes off the rails at all. regards, tom lane
On 11/04/2014 09:10 AM, Tom Lane wrote: > Mikko Tiihonen <Mikko.Tiihonen@nitorcreations.com> writes: >> I do not quite grasp why not sending Sync is so important. My proof of concept setup was for queries with autocommit enabled. > > [snip] It'll be very much like > sending a fixed (predetermined) SQL script to the server using a scripting > language that lacks any conditionals. ... which is part of why I think the batch interface for JDBC is the appropriate UI, and the existing support in PgJDBC just needs to be extended to support returning result sets. The JDBC driver supports multiple result sets, and a batch execution looks just like a procedure that returned multiple result sets (or rather, a mix of result sets, affected rowcounts, and no-info success messages). See http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#addBatch(java.lang.String) http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#executeBatch() The docs are admittedly mostly oriented toward DML, but nothing stops us returning SUCCESS_NO_INFO and a resultset. Or if we choose, returning a rowcount and resultset. It'd be worth testing what other drivers do before doing that, but nothing in the spec: https://jcp.org/aboutJava/communityprocess/mrel/jsr221/index.html seems to stop us doing it. The batch interface doesn't offer any way to set scrollable/editable resultset options, but we don't want to allow that for batches anyway. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/03/2014 07:05 AM, Scott Harrington wrote: > I looked over your patch. Your list of ResultHandlerHolders seems to be > the right direction, but as Tom Lane mentioned there may need to be some > way to ensure the statements are all in the same transaction. Why go down this track, when we already have batching? Just use the standard JDBC API batching facilities and add the missing support for batching queries that return result sets in PgJDBC. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/04/2014 07:56 AM, Mikko Tiihonen wrote: > I do not quite grasp why not sending Sync is so important. Well, at the moment the PgJDBC driver relies on the following flow to manage its buffers and avoid a logjam where both server and client are waiting for the other to consume input: * Send some stuff * Sync * Consume input until Sync response received ... since at this point it knows the server's send buffer should be empty, or near enough. There's no fundamental reason why you *can't* send a Sync after each query, AFAIK. You just have to change how the driver handles syncs so it knows that there might be more work pipelined. Or fix the driver's buffer management, as described in the github issues I linked. It doesn't make much sense to unless you're running in autocommit mode though, and I'm not convinced piplelining work in autocommit mode makes a lot of sense. The biggest problem is exception handling - you can't throw an exception at some statement execution in the past. So you need a new interface for piplelining... or to just use the existing batch interface. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/03/2014 07:05 AM, Scott Harrington wrote: > This avoids the need for a Future, and avoids the client having to > loop/sleep until done. A Future is the logical way to represent an asynchronous operation in Java. Why implement something else that doesn't fit into existing libraries and tools when there's already a standard interface? If you're breaking outside the stock JDBC model and thinking asynchronously, then using the existing Java APIs for asynchrony makes sense to me. In terms of looping until done ... what we really need is a reliably non-blocking PGConnection.consumeInput() so apps can call that in their main loops, or via a helper thread. Then we'd be able to add async notification callbacks too. To do that we need to make PGstream use a java.nio.Channel instead of a java.net.socket . -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/04/2014 07:56 AM, Mikko Tiihonen wrote: > I also think the async I/O is the way to go. Luckily that has already been done > in the pgjdbc-ng (https://github.com/impossibl/pgjdbc-ng), built on top > of netty java NIO library. It has quite good feature parity with the original > pgjdbc driver. Huh, it does seem to now. The big omission, unless I'm blind, is support for COPY. (It also lacks support for JDBC3 and JDBC4, requiring JDK 7 and JDBC 4.1, but that's reasonable enough.) It now has LISTEN/NOTIFY by the looks, and of course it's built around binary protocol support. I freely admit I haven't looked at it too closely. I'm a tad frustrated by the parallel development of a different driver when the "official" driver has so much in the way of low hanging fruit to improve. I have to say I like some aspects of how pgjdbc-ng is being done - in particular use of Maven and being built around non-blocking I/O. OTOH, the use of Google Guava I find pretty inexplicable in a JDBC driver, and since it hard-depends on JDK 7 I don't understand why Netty was used instead of the async / non-blocking features of the core JVM. That may simply be my inexperience with writing non-blocking socket I/O code on Java though. I'm also concerned about how it'll handle new JDBC versions, as it seems to lack the JDBC interface abstraction of the core driver. > I do not think the JDBC batch interface even allow doing updates to multiple > tables when using prepared statements? Ah, I missed this before. That's correct. You get prepared statements _or_ multiple different statements. That's a more understandable reason to concoct a new API, and explains why you're not just solving the issues with batches. Though I still think you're going to have to fix the buffer management code before you do anything like this. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
> On Nov 4, 2014, at 12:55 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > > On 11/04/2014 07:56 AM, Mikko Tiihonen wrote: >> I also think the async I/O is the way to go. Luckily that has already been done >> in the pgjdbc-ng (https://github.com/impossibl/pgjdbc-ng), built on top >> of netty java NIO library. It has quite good feature parity with the original >> pgjdbc driver. > > Huh, it does seem to now. The big omission, unless I'm blind, is support > for COPY. (It also lacks support for JDBC3 and JDBC4, requiring JDK 7 > and JDBC 4.1, but that's reasonable enough.) > I focused on 4.1 compliance instead of legacy support. I am proud to say I believe pgjdbc-ng is 100% feature compliant forJDBC 4.1. Legacy support is still not, and likely will never be, planned. It does lack COPY support as I have never used it myself from the driver, only from tools outside the JVM. > It now has LISTEN/NOTIFY by the looks, and of course it's built around > binary protocol support. > > I freely admit I haven't looked at it too closely. I'm a tad frustrated > by the parallel development of a different driver when the "official" > driver has so much in the way of low hanging fruit to improve. > I’ll only nibble this bait…. I outlined my reasons when I started the project. They were all valid then and still are now. My apologies for causing any frustration with this new path. It’s a lot cleaner, simpler and provides more JDBC featuresthan the original driver because of it. Although I must say, without the original driver and it’s exhausting batteryof unit tests, building a new driver would seem impossible. > I have to say I like some aspects of how pgjdbc-ng is being done - in > particular use of Maven and being built around non-blocking I/O. > > OTOH, the use of Google Guava I find pretty inexplicable in a JDBC > driver, and since it hard-depends on JDK 7 I don't understand why Netty > was used instead of the async / non-blocking features of the core JVM. > That may simply be my inexperience with writing non-blocking socket I/O > code on Java though. > It confuses me as to why you consider using stable, well implemented, well tested and well cared for libraries as inexplicable. Just because we are building a “driver” means we have to write every line of code ourselves? No thanks. Youcan imagine our differences on this philosophy are one of the reasons why I consider pgjdbc-ng’s parallel developmentto be a godsend rather than hacking on the original code. Concerning Guava… A great library with an amazing number of classes that make everyday Java easier. The requirement forJDK 7 was chosen before Guava was considered not because of it. Using it seemed obvious after that decision. Also, wehave internalized the classes we use out of Guava to remove it as a dependency. This is more work to maintain on our partbut makes it worth it when deploying a single JAR. Concerning Netty… All options were entertained at the beginning. The original version actually used a basic NIO socket. After I realized I had to basically write my own framework to work with this socket correctly I searched for an alternativeand found Netty. The deciding factor was that Implementing SSL on on top of the NIO API was considered next toimpossible to get right; according to all prevailing wisdom at the time. Whereas with Netty, SSL support is basicallya single line change. > I'm also concerned about how it'll handle new JDBC versions, as it seems > to lack the JDBC interface abstraction of the core driver. > My plan is to handle adding support for 4.2 and beyond by using a Maven based conditional preprocessor. If that fails, orseems just too ugly, I’ll probably have to look at an abstract class based method like that of the original driver. >> I do not think the JDBC batch interface even allow doing updates to multiple >> tables when using prepared statements? > > Ah, I missed this before. > > That's correct. You get prepared statements _or_ multiple different > statements. > > That's a more understandable reason to concoct a new API, and explains > why you're not just solving the issues with batches. Though I still > think you're going to have to fix the buffer management code before you > do anything like this. > > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > > -- > Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-jdbc
Kevin Wooten <kdubb@me.com> wrote: > > On Nov 4, 2014, at 12:55 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > > > > I have to say I like some aspects of how pgjdbc-ng is being done - in > > particular use of Maven and being built around non-blocking I/O. > > > > OTOH, the use of Google Guava I find pretty inexplicable in a JDBC > > driver, and since it hard-depends on JDK 7 I don't understand why Netty > > was used instead of the async / non-blocking features of the core JVM. > > That may simply be my inexperience with writing non-blocking socket I/O > > code on Java though. > > Java6 has been EOL since 2011 and Java7 is EOL in less than 6 months. I think that depending on old Java 7 version that soon should not even be used in production (without paying for extra support) can hardly be too hard requirement. > It confuses me as to why you consider using stable, well implemented, well tested and well cared for libraries as inexplicable. > Just because we are building a “driver” means we have to write every line of code ourselves? No thanks. Embedding parts of other projects into code-base during build with renamed packages is nowadays common practice in java projects:spring does it, elasticsearch embeds whole netty and more, even jre embeds for example xerces and asm. It might not be the optimal solution, but still definitely better than writing everything from scratch or copy-pasting codefrom other projects. If pgjdbc-ng provides both a thin maven version (with external versioned dependencies) and a fat-jar with the external versionsrepackaged inside the users can choose either old way: just-drop-a-jdbc-jar-into-project or new way with their chosenbuild tool that automatically manages the dependencies. -Mikko