Thread: A few more questions
Here are a few more questions to throw out. 1. There's a comment in AbstractJdbc1Connection saying "TODO: handle transaction status". Did someone have something specific in mind regarding how transaction status was to be "handled"? 2. What's the oldest pgsql version that we intend to support with the drivers in 7.4? Is it 7.0? 6.x? 3. For someone who knows the v3 protocol; there are a number of cases where multiple queries are issued by one statement in the driver. My understanding is that this is not allowed in extended query. Does that mean that we're stuck with multiple round-trips if we substitute extended query? Or may I send multiple consecutive parse, bind, and execute commands without an intervening read of the result; and then read all the results at once? This latter bit seems to me like it risks walking into a trap where I'm blocked on my send buffer waiting for the server to read more of a statement, and the server is blocked on its send buffer waiting for me to read the results from a previous execute. But basically, I'm thinking there has to be a way in v3 extended query to avoid this problem; otherwise, it would be a potentially serious performance issue with no solution. Is there a way? 4. As I plan the changes I'd like to make for v3 protocol support, it's starting to look tempting to segregate a good bit of the code into separate classes for v2 and v3. Thus, we'd have: V2QueryExecutor - For v2 code V3QueryExecutor - For v3 code QueryExecutorUtil - For code shared between both classes above and the same for AbstractJdbc*Statement and AbstractJdbc*ResultSet. Would this meet with objections if I tried to get this incorporated into the actual drivers? -- www.designacourse.com The Easiest Way to Train Anyone... Anywhere. Chris Smith - Lead Software Developer/Technical Trainer MindIQ Corporation
On Fri, 2004-02-13 at 20:09, Chris Smith wrote: > Here are a few more questions to throw out. > > 1. There's a comment in AbstractJdbc1Connection saying "TODO: handle > transaction status". Did someone have something specific in mind regarding > how transaction status was to be "handled"? > > 2. What's the oldest pgsql version that we intend to support with the drivers > in 7.4? Is it 7.0? 6.x? 7.2 is about as far back as I'd like to see handled > > 3. For someone who knows the v3 protocol; there are a number of cases where > multiple queries are issued by one statement in the driver. My understanding > is that this is not allowed in extended query. Does that mean that we're > stuck with multiple round-trips if we substitute extended query? Or may I > send multiple consecutive parse, bind, and execute commands without an > intervening read of the result; and then read all the results at once? > > This latter bit seems to me like it risks walking into a trap where I'm > blocked on my send buffer waiting for the server to read more of a statement, > and the server is blocked on its send buffer waiting for me to read the > results from a previous execute. But basically, I'm thinking there has to be > a way in v3 extended query to avoid this problem; otherwise, it would be a > potentially serious performance issue with no solution. Is there a way? > > 4. As I plan the changes I'd like to make for v3 protocol support, it's > starting to look tempting to segregate a good bit of the code into separate > classes for v2 and v3. Thus, we'd have: > > V2QueryExecutor - For v2 code > V3QueryExecutor - For v3 code > QueryExecutorUtil - For code shared between both classes above Any reason why you wouldn't extend V*QueryExecuter from BaseQueryExecutor ? > > and the same for AbstractJdbc*Statement and AbstractJdbc*ResultSet. Would > this meet with objections if I tried to get this incorporated into the actual > drivers? These are a bit different, not sure what you have in mind here, but the reason they are named like that is to allow you to build under various JDK's, look in the build.xml file to see how that is handled. Dave -- Dave Cramer 519 939 0336 ICQ # 14675561
Chris Smith wrote: > 3. For someone who knows the v3 protocol; there are a number of cases where > multiple queries are issued by one statement in the driver. My understanding > is that this is not allowed in extended query. Does that mean that we're > stuck with multiple round-trips if we substitute extended query? Or may I > send multiple consecutive parse, bind, and execute commands without an > intervening read of the result; and then read all the results at once? The latter. Look at the Sync and Flush messages too -- they're designed to deal with that sort of batching of queries. > This latter bit seems to me like it risks walking into a trap where I'm > blocked on my send buffer waiting for the server to read more of a statement, > and the server is blocked on its send buffer waiting for me to read the > results from a previous execute. But basically, I'm thinking there has to be > a way in v3 extended query to avoid this problem; otherwise, it would be a > potentially serious performance issue with no solution. Is there a way? I raised this with Barry Lind a while back when I was looking at batch updates. There doesn't seem to be an elegant way around it. Ideas: Option #1: limit the total size of outstanding sent data while (more queries) { send next query size = 0 while (more queries && size + next query size < threshold) { send next query size += next query size } send Sync (or possibly Flush depending on how you deal with errors) read results for sent queries, deal with errors, etc } Then we set threshold based on the expected buffer sizes between us and the server (kernel buffer etc). The above code is a bit subtle .. we only need to count queries after the first towards the threshold, since we assume we only block when there are greater than threshold bytes unread by the server, and at a minimum the server will read the first query from the stream before blocking. Also we could probably avoid waiting ignore packets that don't generate large resultsets in this calculation (or track our idea of how much data should be coming back and only start throttling our queries when it gets dangerous). Option #2: use a separate write (probably easier) or read thread. This means creating threads from the driver, which is new, and I'm not sure how the additional context switches on every query will affect performance. Option #3: switch to NIO but I don't think that's really feasible since we'd require 1.4, and NIO has a bunch of resource-exhaustion issues in current JVMs anyway. Option #4: set a socket timeout on writes. I'm not sure how reliable this is, though, or if we can safely recover from timeouts that do occur. > 4. As I plan the changes I'd like to make for v3 protocol support, it's > starting to look tempting to segregate a good bit of the code into separate > classes for v2 and v3. Thus, we'd have: > > V2QueryExecutor - For v2 code > V3QueryExecutor - For v3 code > QueryExecutorUtil - For code shared between both classes above Or subclass BaseQueryExecutor -> V2/V3QueryExecutor. (six of one..) This was pretty high on my list of refactoring to do. Also consider using a single query executor instance per connection and killing the existing static methods -- currently we check the protocol version and create a new instance per query which seems entirely unnecessary. > and the same for AbstractJdbc*Statement and AbstractJdbc*ResultSet. I'm not so sure about this one though. You run the danger of a lot of code duplication here. I was looking at a design where the statement-execution and result-gathering code was broken out into a internal interface/implementation classes, with separate implementations for V2 and V3. These talked only in the native postgresql types known to the backend (int2, int4, int8, bytea, text, etc). Then there was a single Statement/ResultSet implementation that delegated to an instance of the interface, i.e. all the JDBC<->postgresql type mapping and Statement/ResultSet behaviour glue was in this common code. The other thing I was doing here was to move the server-side preparation and translation/parsing of queries down into those protocol-specific classes -- as V2 and V3 work quite differently in these areas (the most obvious bit being that using the V3 extended protocol, you have to parse the query for multiple statements; you'd also want to maintain different state about cursors/PREPARE vs. named portals and statements). I have some very early interface code I can send you if you'd like to take a look. -O
Dave, Dave Cramer wrote: > > 2. What's the oldest pgsql version that we intend to support with > > the drivers in 7.4? Is it 7.0? 6.x? > > 7.2 is about as far back as I'd like to see handled That would be great, from the perspective of trying to simplify the code. However, there's a good bit of code there that exists to support at least as old as 7.0 (the transaction isolation level stuff comes to mind). That code also just happens to be interfering with what I'm doing right now. Is your answer just a one of many opinions, or an okay to nix it all? > > V2QueryExecutor - For v2 code > > V3QueryExecutor - For v3 code > > QueryExecutorUtil - For code shared between both classes above > Any reason why you wouldn't extend V*QueryExecuter from > BaseQueryExecutor ? Actually, I would extend BaseQueryExecutor. I just hadn't completely thought it through when I posted. > These are a bit different, not sure what you have in mind here, but > the reason they are named like that is to allow you to build under > various JDK's, look in the build.xml file to see how that is handled. I do understand that general situation with the separate packages and interfaces for various JDBC/core API versions. I'm not entirely sure what I have in mind here either; I suppose it's mostly QueryExecutor that I want to split at this point, and perhaps to relocate some code from AbstractJdbc1Statement and/or PGStream into the split QueryExecutor classes as well. There seems to be general agreement with splitting QueryExecutor, I'll write something first, and then see if there are remaining problems. -- www.designacourse.com The Easiest Way to Train Anyone... Anywhere. Chris Smith - Lead Software Developer/Technical Trainer MindIQ Corporation
Oliver Jowett wrote: > I raised this with Barry Lind a while back when I was looking at batch > updates. Good to see I'm not the first person to deal with it. Thanks for your comments. > There doesn't seem to be an elegant way around it. Ideas: > > Option #1: limit the total size of outstanding sent data This looks like the best bet to me. It is ugly, but at least its ugliness is confined to one section of code and doesn't have a global impact in the manner of threads or aborting the whole connection. I'll probably take a look at this early next week and see what I can come up with. > This was pretty high on my list of refactoring to do. Also consider > using a single query executor instance per connection and killing the > existing static methods -- currently we check the protocol version and > create a new instance per query which seems entirely unnecessary. Thanks... yes I'll do that. Since the sending and receiving of queries is serialized by a monitor on the PGStream anyway, it shouldn't cost anything this way. Preferably, we could move a lot of protocol-specific code into these polymorphic classes. > I'm not so sure about this one though. You run the danger of a lot of > code duplication here. Right. I am mainly concerned about AbstractJdbc1Statement and all the convolutions it goes through with breaking down and transforming the query into gazillions of forms (server-side prepared statements, cursors, etc.) Trying to do much with that class while retaining all that state is perhaps not the best of ideas. But that could possibly be handled by encapsulating into a separate object, and splitting *that* class into v2 and v3 versions. Probably much cleaner that way. > I have some very early interface code I can send you if you'd like to > take a look. Sure, I'll take a look. That'd be great. -- www.designacourse.com The Easiest Way to Train Anyone... Anywhere. Chris Smith - Lead Software Developer/Technical Trainer MindIQ Corporation
Chris, Chris Smith wrote: > Here are a few more questions to throw out. > > 1. There's a comment in AbstractJdbc1Connection saying "TODO: handle > transaction status". Did someone have something specific in mind regarding > how transaction status was to be "handled"? Well in general someone could include a transaction control statement (begin, commit, rollback) in a sql statement. This would confuse the driver since it is assuming that it is controlling transaction state as a result of the current setting of autoCommit(). The transaction state reported by the server would allow the driver to verify that the state was what the driver expected and allow the driver to correct it if it wasn't (for example if autoCommit() was false and the user issued a commit, then the driver would detect that fact that the transaction had ended and start a new one). > > 2. What's the oldest pgsql version that we intend to support with the drivers > in 7.4? Is it 7.0? 6.x? In the past when this has been discussed on the mail lists we decided to support two past releases. Thus the 7.4 driver would support back to 7.2. However we have also never removed code that enables older versions than that. So one could say we support two versions back, but expect older versions to generally work, but we wouldn't consider it a bug if they didn't. I am fine with actually beginning to remove the older support, but we really should then document the policy on the web site and in the documentation. > > 3. For someone who knows the v3 protocol; there are a number of cases where > multiple queries are issued by one statement in the driver. My understanding > is that this is not allowed in extended query. Does that mean that we're > stuck with multiple round-trips if we substitute extended query? Or may I > send multiple consecutive parse, bind, and execute commands without an > intervening read of the result; and then read all the results at once? > Which examples of multiple statements are you referring to? As I see it, many of the multiple statements in a single query are there because of the limitations of the V2 protocol, but under the V3 protocol they can be eliminated by using the features of the V3 protocol. So which ones are you looking at that you feel are a problem? I am sure there are some and I think Oliver has already responded with good ideas on how to handle those cases, but in general, I think we should be trying to eliminate them were possible. > This latter bit seems to me like it risks walking into a trap where I'm > blocked on my send buffer waiting for the server to read more of a statement, > and the server is blocked on its send buffer waiting for me to read the > results from a previous execute. But basically, I'm thinking there has to be > a way in v3 extended query to avoid this problem; otherwise, it would be a > potentially serious performance issue with no solution. Is there a way? > > 4. As I plan the changes I'd like to make for v3 protocol support, it's > starting to look tempting to segregate a good bit of the code into separate > classes for v2 and v3. Thus, we'd have: > > V2QueryExecutor - For v2 code > V3QueryExecutor - For v3 code > QueryExecutorUtil - For code shared between both classes above > Breaking this code out would be a very good idea. > and the same for AbstractJdbc*Statement and AbstractJdbc*ResultSet. Would > this meet with objections if I tried to get this incorporated into the actual > drivers? The AbstractJdbc* stuff is much more tricky. But I think from later emails in this thread you understand the issues and are on the right track. thanks, --Barry