Thread: Re: JDBC gripe list (the autocommit subthread)
> Well in the postgresql world every statement is a transaction. I agree with that. But a batch with autocommit=true is "many" statements, as per the doc. > That being said the concept of batch processing in postgres is that it > would be done in a transaction otherwise what is the point ? If addBatch was not meant to be called when autocommit=true, then it would have thrown an exception. The point is to enablemultiple statement in 1 executeBatch call. Just imagine a system that separates who makes statements and who executesthem. Like event logging lets say. Meanwhile, there are infinite cases where multiple statements are not (and mustnot) be in a all-or-nothing transaction. This is why applications choose to set autocommit=true to obtain the batch behaviorwithout a TX. It's in the API for such reasons. It is just incorrect to consider the batch is 1 transaction when the API clearly exposes the ability to avoid it. As I wroteearlier, calling applications that just pile up updates in a batch not expecting any deadlock due to row locking byeach statement, will not work anymore. The fact the API have autocommit independant from batches means it serve a purpose. I see it. But even if I didn't, the APIsays so and I can't second guess it. I know it hurts to learn such truth after such a long delay. You'll get over it! lol! I have found a 4 year old bug lately,in my own code. I know the feeling. But I can't decide to call it a feature...lol > If you agree with that then in the postgres world it would be natural > for all of it to fail. At least thats how I would expect postgres to act. > > Dave
Quartz <quartz12h@yahoo.com> wrote: > If addBatch was not meant to be called when autocommit=true, then > it would have thrown an exception. The point is to enable multiple > statement in 1 executeBatch call. Just imagine a system that > separates who makes statements and who executes them. Like event > logging lets say. Meanwhile, there are infinite cases where > multiple statements are not (and must not) be in a all-or-nothing > transaction. This is why applications choose to set > autocommit=true to obtain the batch behavior without a TX. It's in > the API for such reasons. The docs say something completely at odds with your assertion: | a JDBC driver may or may not continue to process the remaining | commands in the batch. However, the driver's behavior must be | consistent with a particular DBMS, either always continuing to | process commands or never continuing to process commands. > It is just incorrect to consider the batch is 1 transaction when > the API clearly exposes the ability to avoid it. And the option not to. > As I wrote earlier, calling applications that just pile up updates > in a batch not expecting any deadlock due to row locking by each > statement, will not work anymore. Anymore? When did batches in PostgreSQL work that way? > The fact the API have autocommit independant from batches means it > serve a purpose. I see it. But even if I didn't, the API says so > and I can't second guess it. You are misreading it. > I know it hurts to learn such truth after such a long delay. > You'll get over it! That's not the way to persuade people. You're approaching the point where people will just start ignoring your posts as noise. The bottom line is that there is a perfectly clean and portable way to run the statements such that you can ignore or retry failures -- execute each separately in your Java code. That you chose to use an API which allows but doesn't require a driver to support the behavior you want doesn't make the behavior mandatory. Most people use the batch feature for performance, and in PostgreSQL it would reduce the performance of the batch feature to accommodate what you propose. > I have found a 4 year old bug lately, in my own code. I know the > feeling. It would appear that you've found but not yet recognized another bug -- inappropriate use of an API. You're counting on an implementation detail rather than the documented API, and that's a bug. -Kevin
On Wed, Mar 30, 2011 at 10:55 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Quartz <quartz12h@yahoo.com> wrote: > >> If addBatch was not meant to be called when autocommit=true, then >> it would have thrown an exception. The point is to enable multiple >> statement in 1 executeBatch call. Just imagine a system that >> separates who makes statements and who executes them. Like event >> logging lets say. Meanwhile, there are infinite cases where >> multiple statements are not (and must not) be in a all-or-nothing >> transaction. This is why applications choose to set >> autocommit=true to obtain the batch behavior without a TX. It's in >> the API for such reasons. > > The docs say something completely at odds with your assertion: > > | a JDBC driver may or may not continue to process the remaining > | commands in the batch. However, the driver's behavior must be > | consistent with a particular DBMS, either always continuing to > | process commands or never continuing to process commands. > >> It is just incorrect to consider the batch is 1 transaction when >> the API clearly exposes the ability to avoid it. > > And the option not to. > >> As I wrote earlier, calling applications that just pile up updates >> in a batch not expecting any deadlock due to row locking by each >> statement, will not work anymore. > > Anymore? When did batches in PostgreSQL work that way? > >> The fact the API have autocommit independant from batches means it >> serve a purpose. I see it. But even if I didn't, the API says so >> and I can't second guess it. > > You are misreading it. > >> I know it hurts to learn such truth after such a long delay. >> You'll get over it! > > That's not the way to persuade people. You're approaching the point > where people will just start ignoring your posts as noise. > > The bottom line is that there is a perfectly clean and portable way > to run the statements such that you can ignore or retry failures -- > execute each separately in your Java code. That you chose to use an > API which allows but doesn't require a driver to support the > behavior you want doesn't make the behavior mandatory. Most people > use the batch feature for performance, and in PostgreSQL it would > reduce the performance of the batch feature to accommodate what you > propose. > >> I have found a 4 year old bug lately, in my own code. I know the >> feeling. > > It would appear that you've found but not yet recognized another bug > -- inappropriate use of an API. You're counting on an > implementation detail rather than the documented API, and that's a > bug. > > -Kevin > Thanks Kevin +1 Dave Cramer dave.cramer(at)credativ(dot)ca http://www.credativ.ca
> The docs say something completely at odds with your > assertion: > > a JDBC driver may or may not continue to process the > remaining commands in the batch. However, the driver's behavior > must be consistent with a particular DBMS, either always > continuing to process commands or never continuing to process > commands. Then let's simplify the whole discussion: If I issue statement 1, 2 and 3, and statement 2 fails, there is no prescribed behavior for statement 3, but one sure thingis the statement 1 must be completed. This is exactly where postgres driver fails. I have nothing else to say.
Quartz <quartz12h@yahoo.com> wrote: > If I issue statement 1, 2 and 3, and statement 2 fails, there is > no prescribed behavior for statement 3, but one sure thing is the > statement 1 must be completed. I don't see how executeBatch can both throw the documented exception (BatchUpdateException) and return the int[] which would tell you what completed. It doesn't seem to me to make sense to complete some unknown mystery subset of the statements, so the only sane behavior is to apply none if the documented exception is thrown. -Kevin
I wrote: > I don't see how executeBatch can both throw the documented > exception (BatchUpdateException) and return the int[] which would > tell you what completed. Oh, I see now. There's an array in that exception class. Will look at this a bit more. -Kevin
`Kevin Grittner <Kgrittn@wicourts.gov> wrote: > There's an array in that exception class. I'm coming around to the position that we shouldn't tinker with autoCommit within the executeBatch method. I still think it would be best for us to consistently bail out on the first failure, but if autoCommit is on, we could build the BatchUpdateException using an array of the length of the successfully completed statements. If autoCommit is off, I'm not sure whether it would be better to leave the updateCounts property null or use a zero length array; but clearly no statements should be considered successful. The API documentation does seem to suggest it should work that way. The bad news is that this would be a behavior change, and could thus break existing code for current PostgreSQL users. When that's the case, we generally like to see a reasonable use case for the new behavior even when it is standard. So far we have a rather hand-wavy assertion that it would be useful for logging and "an infinite number of" other uses. It would probably help sway the community if there was a more concrete explanation of why this was better than the alternatives for logging purposes, and to sketch out one or two of the other infinite number of use cases. -Kevin
int[] ia = batchUpdateException.getUpdateCount() --- On Thu, 3/31/11, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > From: Kevin Grittner <Kevin.Grittner@wicourts.gov> > Subject: Re: [JDBC] JDBC gripe list (the autocommit subthread) > To: "Quartz" <quartz12h@yahoo.com> > Cc: pgsql-jdbc@postgresql.org > Received: Thursday, March 31, 2011, 11:03 AM > Quartz <quartz12h@yahoo.com> > wrote: > > > If I issue statement 1, 2 and 3, and statement 2 > fails, there is > > no prescribed behavior for statement 3, but one sure > thing is the > > statement 1 must be completed. > > I don't see how executeBatch can both throw the documented > exception > (BatchUpdateException) and return the int[] which would > tell you > what completed. It doesn't seem to me to make sense > to complete > some unknown mystery subset of the statements, so the only > sane > behavior is to apply none if the documented exception is > thrown. > > -Kevin >
Exactly. Good. Sorry I didn't read email ahead. --- On Thu, 3/31/11, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > From: Kevin Grittner <Kevin.Grittner@wicourts.gov> > Subject: Re: [JDBC] JDBC gripe list (the autocommit subthread) > To: "Kevin Grittner" <Kevin.Grittner@wicourts.gov>, "Quartz" <quartz12h@yahoo.com> > Cc: pgsql-jdbc@postgresql.org > Received: Thursday, March 31, 2011, 11:07 AM > I wrote: > > > I don't see how executeBatch can both throw the > documented > > exception (BatchUpdateException) and return the int[] > which would > > tell you what completed. > > Oh, I see now. There's an array in that exception > class. Will look > at this a bit more. > > -Kevin >
> I'm coming around to the position that we shouldn't tinker > with > autoCommit within the executeBatch method. I still > think it would > be best for us to consistently bail out on the first > failure, but if > autoCommit is on, we could build the BatchUpdateException > using an > array of the length of the successfully completed > statements. If > autoCommit is off, I'm not sure whether it would be better > to leave > the updateCounts property null or use a zero length array; > but > clearly no statements should be considered successful. > > The API documentation does seem to suggest it should work > that way. > > The bad news is that this would be a behavior change, and > could thus > break existing code for current PostgreSQL users. > When that's the > case, we generally like to see a reasonable use case for > the new > behavior even when it is standard. So far we have a > rather > hand-wavy assertion that it would be useful for logging and > "an > infinite number of" other uses. It would probably > help sway the > community if there was a more concrete explanation of why > this was > better than the alternatives for logging purposes, and to > sketch out > one or two of the other infinite number of use cases. You have been defending all that long that most use the autocommit=false when using batches. Then they won't break....! Besides that's what release notes are for. And I dare say, if they expected a transaction when using a batch with autocommit=true,it about time they learn their mistake. JDBC api is a contract. Can't make exception for postgres.
On Mar 31, 2011, at 3:34 PM, Quartz wrote:
Besides that's what release notes are for. And I dare say, if they expected a transaction when using a batch with autocommit=true, it about time they learn their mistake. JDBC api is a contract. Can't make exception for postgres.
Quartz, the problem is that behavior of batch updates when autocommit=true is not spec-defined, it's implementation-defined. Just because MySQL does it one way doesn't make that the "right" way. Look at this post from 2009:
"The behavior after a failure is DBMS specific, as documented in Statement.executeBatch(). Some unit tests I've run had shown that MSSQL continues with the rest of the statements while Oracle aborts the batch immediately."
And reading through the JDBC guide, albeit for an older version, here:
states
"For this reason, autocommit should always be turned off when batch updates are done. The commit behavior of
executeBatch
is always implementation defined when an error occurs and autocommit is true."And from the most recent JDBC tutorial, here:
"To allow for correct error handling, you should always disable auto-commit mode before beginning a batch update."
It seems to me that this is a case of you expecting behavior that is not spec-defined, but because your prior experience with MySQL has taught you to expect certain behavior, you expect that behavior to also be present in other drivers even though the spec does not clearly state the expected behavior (it explicitly states that it's implementation-defined). This is clear in the first post I linked to, where MSSQL continues (as does MySQL from your admission), but Oracle aborts.
-- Jeff Hubbach
On 1 April 2011 10:34, Quartz <quartz12h@yahoo.com> wrote: > You have been defending all that long that most use the autocommit=false when using batches. Then they won't break....! > > Besides that's what release notes are for. And I dare say, if they expected a transaction when using a batch with autocommit=true,it about time they learn their mistake. JDBC api is a contract. Can't make exception for postgres. The JDBC spec says that the behavior of executeBatch() with autocommit=true is implementation defined, and specifically warns against using it with autocommit on. I don't see any problems with the driver's current behavior: (a) it would be quite a lot of extra work to wrap each statement in a separate transaction (we'd have to send explicit BEGIN/END around each statement execution; (b) the current behavior is consistent with how multiple statement execution works elsewhere in the driver, where if you execute "SELECT a; SELECT b" as a statement with autocommit=true then the two queries run in a single transaction; (c) usually batch updates are there for performance reasons, which is at odds with creating a separate transaction for each batched statement; (d) the current behavior *is* allowed by the specification. The problem is with your code relying on particular behavior of executeBatch() + autocommit, which the spec explicitly warns is implementation-defined behavior that you shouldn't rely on. Oliver
Oliver Jowett <oliver@opencloud.com> wrote: > (b) the current behavior is consistent with how multiple statement > execution works elsewhere in the driver, where if you execute > "SELECT a; SELECT b" as a statement with autocommit=true then the > two queries run in a single transaction; I did not know that. Is that required by spec? It definitely doesn't happen in psql: test=# select now(); select now(); now ------------------------------- 2011-03-31 17:38:41.345244-05 (1 row) now ------------------------------- 2011-03-31 17:38:41.410403-05 (1 row) test=# begin; select now(); select now(); commit; BEGIN now ------------------------------- 2011-03-31 17:38:58.593238-05 (1 row) now ------------------------------- 2011-03-31 17:38:58.593238-05 (1 row) COMMIT I would have expected more or less the same from this: import java.sql.*; public class MultiStmt { public static void main(String[] args) throws Exception { Class.forName("org.postgresql.Driver"); Connection con = DriverManager.getConnection ("jdbc:postgresql:test", "kgrittn", ""); Statement stmt = con.createStatement(); for (boolean rsfound = stmt.execute ("select now(); select now();"); rsfound || stmt.getUpdateCount() != -1; rsfound = stmt.getMoreResults()) { ResultSet rs = stmt.getResultSet(); while (rs.next()) System.out.println (rs.getTimestamp(1)); rs.close(); } stmt.close(); con.close(); } } When I run that, I see that it behaves as you say. -Kevin
On 1 April 2011 12:06, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Oliver Jowett <oliver@opencloud.com> wrote: > >> (b) the current behavior is consistent with how multiple statement >> execution works elsewhere in the driver, where if you execute >> "SELECT a; SELECT b" as a statement with autocommit=true then the >> two queries run in a single transaction; > > I did not know that. Is that required by spec? Depending on how you read the javadoc, yes: --- void setAutoCommit(boolean autoCommit) throws SQLException [...] The commit occurs when the statement completes. The time when the statement completes depends on the type of SQL Statement: [...] For CallableStatement objects or for statements that return multiple results, the statement is complete when all of the associated result sets have been closed, and all update counts and output parameters have been retrieved. --- I don't think we strictly follow that - as the commit is driven by the server, we may commit before all resultsets are closed, in general - but the intent is the same, a single JDBC statement that does multiple things executes in one transaction, not several. It's also consistent with how the v2 protocol works. > It definitely doesn't happen in psql: psql does things differently, but psql is not a JDBC driver.. Oliver
On Mar 31, 2011, at 7:06 PM, Kevin Grittner wrote: > Oliver Jowett <oliver@opencloud.com> wrote: > >> (b) the current behavior is consistent with how multiple statement >> execution works elsewhere in the driver, where if you execute >> "SELECT a; SELECT b" as a statement with autocommit=true then the >> two queries run in a single transaction; > > I did not know that. Is that required by spec? > > It definitely doesn't happen in psql: psql includes an SQL lexer which breaks up the statements. Weird, but true! http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=src/bin/psql/psqlscan.l;h=3575f91a61477f09c4939e53fb1495d763485e86;hb=HEAD#l1170 Note that the v2 backend protocol does not support multiple queries per statement. Cheers, M
On 1 April 2011 12:21, A.M. <agentm@themactionfaction.com> wrote: > Note that the v2 backend protocol does not support multiple queries per statement. Actually it is the other way around.. v2 and v3 simple query will parse semicolon-separated statements (so older drivers did nothing special with semicolon-separated statements, they just got passed intact to the server). v3 extended query does not support multiple queries per Parse/Execute, so the driver parses semicolon-separated statements and submits them via separate Parse commands (but within the same implicit transaction block terminated by Sync) - which achieves essentially the same behavior as v2/v3 simple query while still allowing access to the extra flexibility of the extended query protocol. Oliver
So what is the use case for autocommit = true for batch inserts ? to me the whole point of batch inserts is speed with autocommit= false it should be faster. I would also think the logic is much simpler if I get an all or nothing commit. Dave Cramer dave.cramer(at)credativ(dot)ca http://www.credativ.ca On Thu, Mar 31, 2011 at 5:34 PM, Quartz <quartz12h@yahoo.com> wrote: >> I'm coming around to the position that we shouldn't tinker >> with >> autoCommit within the executeBatch method. I still >> think it would >> be best for us to consistently bail out on the first >> failure, but if >> autoCommit is on, we could build the BatchUpdateException >> using an >> array of the length of the successfully completed >> statements. If >> autoCommit is off, I'm not sure whether it would be better >> to leave >> the updateCounts property null or use a zero length array; >> but >> clearly no statements should be considered successful. >> >> The API documentation does seem to suggest it should work >> that way. >> >> The bad news is that this would be a behavior change, and >> could thus >> break existing code for current PostgreSQL users. >> When that's the >> case, we generally like to see a reasonable use case for >> the new >> behavior even when it is standard. So far we have a >> rather >> hand-wavy assertion that it would be useful for logging and >> "an >> infinite number of" other uses. It would probably >> help sway the >> community if there was a more concrete explanation of why >> this was >> better than the alternatives for logging purposes, and to >> sketch out >> one or two of the other infinite number of use cases. > > > You have been defending all that long that most use the autocommit=false when using batches. Then they won't break....! > > Besides that's what release notes are for. And I dare say, if they expected a transaction when using a batch with autocommit=true,it about time they learn their mistake. JDBC api is a contract. Can't make exception for postgres. > > > -- > Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-jdbc >
Please don't refer to old docs/specs. I posted latest docs previously. --- On Thu, 3/31/11, Jeff Hubbach <Jeff.Hubbach@cha.com> wrote:
|
I posted that the remaining question is not about how and why it is used. It is about the fact that with autocommit=true, when a the first statement to fail is in the middle of a batch, all previousstatement are supposed to be completed and NOT rolled back (regardless of the other statement at or after the failedstatement). > (a) it would be quite a lot of extra work to wrap each > statement in a separate transaction (we'd have to send explicit BEGIN/END > around each statement execution; > (c) usually batch updates are there for performance > reasons, which is at odds with creating a separate transaction for each > batched statement; It is not about performance. It is about behavior. > (d) the current behavior *is* allowed by the > specification. Not it isn't. (see top of message.) --- On Thu, 3/31/11, Oliver Jowett <oliver@opencloud.com> wrote: > From: Oliver Jowett <oliver@opencloud.com> > Subject: Re: [JDBC] JDBC gripe list (the autocommit subthread) > To: "Quartz" <quartz12h@yahoo.com> > Cc: "Kevin Grittner" <Kevin.Grittner@wicourts.gov>, pgsql-jdbc@postgresql.org > Received: Thursday, March 31, 2011, 6:32 PM > On 1 April 2011 10:34, Quartz <quartz12h@yahoo.com> > wrote: > > > You have been defending all that long that most use > the autocommit=false when using batches. Then they won't > break....! > > > > Besides that's what release notes are for. And I dare > say, if they expected a transaction when using a batch with > autocommit=true, it about time they learn their mistake. > JDBC api is a contract. Can't make exception for postgres. > > The JDBC spec says that the behavior of executeBatch() > with > autocommit=true is implementation defined, and specifically > warns > against using it with autocommit on. > > I don't see any problems with the driver's current > behavior: > > (a) it would be quite a lot of extra work to wrap each > statement in a > separate transaction (we'd have to send explicit BEGIN/END > around each > statement execution; > (b) the current behavior is consistent with how multiple > statement > execution works elsewhere in the driver, where if you > execute "SELECT > a; SELECT b" as a statement with autocommit=true then the > two queries > run in a single transaction; > (c) usually batch updates are there for performance > reasons, which is > at odds with creating a separate transaction for each > batched > statement; > (d) the current behavior *is* allowed by the > specification. > > The problem is with your code relying on particular > behavior of > executeBatch() + autocommit, which the spec explicitly > warns is > implementation-defined behavior that you shouldn't rely > on. > > Oliver >
Once again... You cannot assume a batch is a transaction. Especially, it may deadlock when application are batching statements expectingthat affected rows won't be locked in groups. --- On Fri, 4/1/11, Dave Cramer <pg@fastcrypt.com> wrote: > From: Dave Cramer <pg@fastcrypt.com> > Subject: Re: [JDBC] JDBC gripe list (the autocommit subthread) > To: "Quartz" <quartz12h@yahoo.com> > Cc: "Kevin Grittner" <Kevin.Grittner@wicourts.gov>, pgsql-jdbc@postgresql.org > Received: Friday, April 1, 2011, 5:49 AM > So what is the use case for > autocommit = true for batch inserts ? > > to me the whole point of batch inserts is speed with > autocommit= false > it should be faster. I would also think the logic is much > simpler if I > get an all or nothing commit. > > > Dave Cramer > > dave.cramer(at)credativ(dot)ca > http://www.credativ.ca > > > > > On Thu, Mar 31, 2011 at 5:34 PM, Quartz <quartz12h@yahoo.com> > wrote: > >> I'm coming around to the position that we > shouldn't tinker > >> with > >> autoCommit within the executeBatch method. I > still > >> think it would > >> be best for us to consistently bail out on the > first > >> failure, but if > >> autoCommit is on, we could build the > BatchUpdateException > >> using an > >> array of the length of the successfully completed > >> statements. If > >> autoCommit is off, I'm not sure whether it would > be better > >> to leave > >> the updateCounts property null or use a zero > length array; > >> but > >> clearly no statements should be considered > successful. > >> > >> The API documentation does seem to suggest it > should work > >> that way. > >> > >> The bad news is that this would be a behavior > change, and > >> could thus > >> break existing code for current PostgreSQL users. > >> When that's the > >> case, we generally like to see a reasonable use > case for > >> the new > >> behavior even when it is standard. So far we > have a > >> rather > >> hand-wavy assertion that it would be useful for > logging and > >> "an > >> infinite number of" other uses. It would > probably > >> help sway the > >> community if there was a more concrete explanation > of why > >> this was > >> better than the alternatives for logging purposes, > and to > >> sketch out > >> one or two of the other infinite number of use > cases. > > > > > > You have been defending all that long that most use > the autocommit=false when using batches. Then they won't > break....! > > > > Besides that's what release notes are for. And I dare > say, if they expected a transaction when using a batch with > autocommit=true, it about time they learn their mistake. > JDBC api is a contract. Can't make exception for postgres. > > > > > > -- > > Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-jdbc > > >
Hi Quartz I think at this point your best option is to change the driver yourself, and send a patch to the list, because there doesn't seem to be anyone else who agrees with your interpretation of the spec (and so it's unlikely to get built by anyone else) You'll need to justify the behavioral/performance changes that your patch causes - for example, we wouldn't want existing apps to suddenly slow down because they're doing 100x the previous number of transactions. I'd suggest making it a configurable connection option, defaulting to off, to avoid that problem. Oliver