Thread: Proposal to fix Statement.executeBatch()
I've finished the secion on batch updates in the JDBC 2.0 compliance documentation on http://lab.applinet.nl/postgresql-jdbc/ (see the quote of the relevant part below). In the short term I think two things need to be fixed: 1) don't begin, commit or rollback a transaction implicitly in Statement.executeBatch() 2) have executeBatch() throw a BatchUpdateException when it is required to do so by the JDBC spec If there are no objections from this list I intend to submit a patch that fixes 1), and perhaps also 2). Note that this may cause backward compatibility issues with JDBC applications that have come to rely on the incorrect behaviour. OTOH, there have been complaints on this list before, and those people would certainly be happy about the fix. E.g. http://fts.postgresql.org/db/mw/msg.html?mid=83832 In the long run it would be nice if the backend would support returning one update count (and perhaps an OID) per statement send in a semicolon separated multi-statement call. Would this be something for the backend TODO list? OTOH, I'm not sure if this (small?) performance improvement is worth the trouble. "Batch updates The driver supports batch updates with the addBatch, clearBatch and executeBatch methods of Statement, PreparedStatement and CallableStatement. DatabaseMetaData.supportsBatchUpdates() returns true. However, executing statements in a batch does not provide a performance improvement with PostgreSQL, since all statements are internally send to the backend and processed one-by-one. That defeats the purpose of the batch methods. The intended behaviour is to send a set of update/insert/delete/DDL statements in one round trip to the database. Unfortunately, this optional JDBC feature cannot be implemented correctly with PostgreSQL, since the backend only returns the update count of the last statement send in one call with multiple statements. JDBC requires it to return an array with the update counts of all statements in the batch. Even though the batch processing feature currently provides no performance improvement, it should not be removed from the driver for reasons of backward compatibility. The current implementation of Statement.executeBatch() in PostgreSQL starts a new transaction and commits or aborts it. This is not in compliance with the JDBC specification, which does not mention transactions in the description of Statement.executeBatch() at all. The confusion is probably caused by a JDBC tutorial from Sun with example code which disables autocommit before calling executeBatch "so that the transaction will not be automatically committed or rolled back when the method executeBatch is called". This comment in the tutorials appears to be a misunderstanding. A good reason to disable autocommit before calling executeUpdate() is to be able to commit or rollback all statements in a batch as a unit. With autocommit enabled, the application would not know which statements had and had not been processed when an exception is thrown. It is the responsibility of the application, however, to disable autocommit and to commit or rollback a transaction. Note that Oracle's implementation of executeBatch() also does not commit or rollback a transaction implicitly. The implementation of Statement.executeBatch() in PostgreSQL should be changed to not begin, commit or rollback a transaction. Support for BatchUpdateException is not yet implemented. The implementation of executeBatch is incorrect, therefore, since it is required to throw a BatchUpdateException if one of the commands in the batch returns something other than an update count." Regards, René Pijlman
Rene, I see your statements below as incorrect: > The intended behaviour is to send a set of update/insert/delete/DDL > statements in one round trip to the database. Unfortunately, > this optional JDBC feature cannot be implemented correctly with > PostgreSQL, since the backend only returns the update count of > the last statement send in one call with multiple statements. > JDBC requires it to return an array with the update counts of > all statements in the batch. The intended behaviour is certainly to send all of the statements in one round trip. And the JDBC2.1 spec certainly allows postgres to do just that. Here is how I would suggest this be done in a way that is spec compliant (Note: that I haven't looked at the patch you submited yet, so forgive me if you have already done it this way, but based on your comments in this email, my guess is that you have not). Statements should be batched together in a single statement with semicolons separating the individual statements (this will allow the backend to process them all in one round trip). The result array should return an element with the row count for each statement, however the value for all but the last statement will be '-2'. (-2 is defined by the spec to mean the statement was processed successfully but the number of affected rows is unknown). In the event of an error, then the driver should return an array the size of the submitted batch with values of -3 for all elements. -3 is defined by the spec as the corresponding statement failed to execute successfully, or for statements that could not be processed for some reason. Since in postgres when one statement fails (in non-autocommit mode), the entire transaction is aborted this is consistent with a return value of -3 in my reading of the spec. I believe this approach makes the most sense because: 1) It implements batches in one round trip (the intention of the feature) 2) It is complient with the standard 3) It is complient with the current functionality of the backend thanks, --Barry Rene Pijlman wrote: > I've finished the secion on batch updates in the JDBC 2.0 > compliance documentation on > http://lab.applinet.nl/postgresql-jdbc/ (see the quote of the > relevant part below). > > In the short term I think two things need to be fixed: > 1) don't begin, commit or rollback a transaction implicitly in > Statement.executeBatch() > 2) have executeBatch() throw a BatchUpdateException when it is > required to do so by the JDBC spec > > If there are no objections from this list I intend to submit a > patch that fixes 1), and perhaps also 2). > > Note that this may cause backward compatibility issues with JDBC > applications that have come to rely on the incorrect behaviour. > OTOH, there have been complaints on this list before, and those > people would certainly be happy about the fix. E.g. > http://fts.postgresql.org/db/mw/msg.html?mid=83832 > > In the long run it would be nice if the backend would support > returning one update count (and perhaps an OID) per statement > send in a semicolon separated multi-statement call. Would this > be something for the backend TODO list? OTOH, I'm not sure if > this (small?) performance improvement is worth the trouble. > > "Batch updates > > The driver supports batch updates with the addBatch, clearBatch > and executeBatch methods of Statement, PreparedStatement and > CallableStatement. DatabaseMetaData.supportsBatchUpdates() > returns true. > > However, executing statements in a batch does not provide a > performance improvement with PostgreSQL, since all statements > are internally send to the backend and processed one-by-one. > That defeats the purpose of the batch methods. The intended > behaviour is to send a set of update/insert/delete/DDL > statements in one round trip to the database. Unfortunately, > this optional JDBC feature cannot be implemented correctly with > PostgreSQL, since the backend only returns the update count of > the last statement send in one call with multiple statements. > JDBC requires it to return an array with the update counts of > all statements in the batch. Even though the batch processing > feature currently provides no performance improvement, it should > not be removed from the driver for reasons of backward > compatibility. > > The current implementation of Statement.executeBatch() in > PostgreSQL starts a new transaction and commits or aborts it. > This is not in compliance with the JDBC specification, which > does not mention transactions in the description of > Statement.executeBatch() at all. The confusion is probably > caused by a JDBC tutorial from Sun with example code which > disables autocommit before calling executeBatch "so that the > transaction will not be automatically committed or rolled back > when the method executeBatch is called". This comment in the > tutorials appears to be a misunderstanding. A good reason to > disable autocommit before calling executeUpdate() is to be able > to commit or rollback all statements in a batch as a unit. With > autocommit enabled, the application would not know which > statements had and had not been processed when an exception is > thrown. It is the responsibility of the application, however, to > disable autocommit and to commit or rollback a transaction. Note > that Oracle's implementation of executeBatch() also does not > commit or rollback a transaction implicitly. The implementation > of Statement.executeBatch() in PostgreSQL should be changed to > not begin, commit or rollback a transaction. > > Support for BatchUpdateException is not yet implemented. The > implementation of executeBatch is incorrect, therefore, since it > is required to throw a BatchUpdateException if one of the > commands in the batch returns something other than an update > count." > > Regards, > René Pijlman > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > >
On Mon, 27 Aug 2001 11:07:55 -0700, you wrote: [executeBatch() implemented as one round trip] >Here is how I would suggest this be done in a way that is spec >compliant (Note: that I haven't looked at the patch you submited yet, so >forgive me if you have already done it this way, but based on your >comments in this email, my guess is that you have not). Indeed, I have not implemented this. >Statements should be batched together in a single statement with >semicolons separating the individual statements (this will allow the >backend to process them all in one round trip). > >The result array should return an element with the row count for each >statement, however the value for all but the last statement will be >'-2'. (-2 is defined by the spec to mean the statement was processed >successfully but the number of affected rows is unknown). Ah, I see. I hadn't thought of that solution. >In the event of an error, then the driver should return an array the >size of the submitted batch with values of -3 for all elements. -3 is >defined by the spec as the corresponding statement failed to execute >successfully, or for statements that could not be processed for some >reason. Since in postgres when one statement fails (in non-autocommit >mode), the entire transaction is aborted this is consistent with a >return value of -3 in my reading of the spec. Not quite. A statement in a batch may also fail because its a succesful SELECT as far as the server is concerned (can't have select's in a batch). But that situation can also be handled correctly by setting the update count for that particular statement to -3. Its then up to the application to decide if it wants to rollback, I would say. But what to do when an error occurs with autocommit enabled? This is not recommended, but allowed by the spec, if I understand it correctly. What exactly is the behaviour of the backend in that scenario? Does it commit every separate SQL statement in the semicolon-separated list, or does it commit the list as a whole? Does it abort processing the statement list when an error occurs in one statement? And if it continues, does it return an error when only one statement in the middle of the list had an error? >I believe this approach makes the most sense because: >1) It implements batches in one round trip (the intention of the feature) >2) It is complient with the standard >3) It is complient with the current functionality of the backend If we can come up with an acceptable solution for an error with autocommit enabled, I agree. Otherwise, I'm not sure. However, it would mean a change in behaviour of the driver that may break existing JDBC applications: the driver will no longer return update counts for all statements in a batch like it currently does, it will return "unknown" for most statements. I'm not sure if the performance improvement justifies this non-backwardly-compatible change, though I agree this is the intention of the feature. What do you think? Regards, René Pijlman
> What exactly is the behaviour of the backend in that scenario? > Does it commit every separate SQL statement in the > semicolon-separated list, or does it commit the list as a whole? > Does it abort processing the statement list when an error occurs > in one statement? And if it continues, does it return an error > when only one statement in the middle of the list had an error? I do not know what the server does if you have autocommit enabled and you issue multiple statements in one try. However, I would be OK with the driver issuing the statements one by one with autocommit on. If you are running in this mode you just wouldn't get any performance improvement. > However, it would mean a change in behaviour of the driver that > may break existing JDBC applications: the driver will no longer > return update counts for all statements in a batch like it > currently does, it will return "unknown" for most statements. > I'm not sure if the performance improvement justifies this > non-backwardly-compatible change, though I agree this is the > intention of the feature. What do you think? I wouldn't worry about this 'change in behavior' because if the caller is JDBC complient it should be coded to handle the new behavior as it is complient with the spec. thanks, --Barry Rene Pijlman wrote: > On Mon, 27 Aug 2001 11:07:55 -0700, you wrote: > [executeBatch() implemented as one round trip] > >>Here is how I would suggest this be done in a way that is spec >>compliant (Note: that I haven't looked at the patch you submited yet, so >>forgive me if you have already done it this way, but based on your >>comments in this email, my guess is that you have not). >> > > Indeed, I have not implemented this. > > >>Statements should be batched together in a single statement with >>semicolons separating the individual statements (this will allow the >>backend to process them all in one round trip). >> >>The result array should return an element with the row count for each >>statement, however the value for all but the last statement will be >>'-2'. (-2 is defined by the spec to mean the statement was processed >>successfully but the number of affected rows is unknown). >> > > Ah, I see. I hadn't thought of that solution. > > >>In the event of an error, then the driver should return an array the >>size of the submitted batch with values of -3 for all elements. -3 is >>defined by the spec as the corresponding statement failed to execute >>successfully, or for statements that could not be processed for some >>reason. Since in postgres when one statement fails (in non-autocommit >>mode), the entire transaction is aborted this is consistent with a >>return value of -3 in my reading of the spec. >> > > Not quite. A statement in a batch may also fail because its a > succesful SELECT as far as the server is concerned (can't have > select's in a batch). But that situation can also be handled > correctly by setting the update count for that particular > statement to -3. Its then up to the application to decide if it > wants to rollback, I would say. > > But what to do when an error occurs with autocommit enabled? > This is not recommended, but allowed by the spec, if I > understand it correctly. > > What exactly is the behaviour of the backend in that scenario? > Does it commit every separate SQL statement in the > semicolon-separated list, or does it commit the list as a whole? > Does it abort processing the statement list when an error occurs > in one statement? And if it continues, does it return an error > when only one statement in the middle of the list had an error? > > >>I believe this approach makes the most sense because: >>1) It implements batches in one round trip (the intention of the feature) >>2) It is complient with the standard >>3) It is complient with the current functionality of the backend >> > > If we can come up with an acceptable solution for an error with > autocommit enabled, I agree. Otherwise, I'm not sure. > > However, it would mean a change in behaviour of the driver that > may break existing JDBC applications: the driver will no longer > return update counts for all statements in a batch like it > currently does, it will return "unknown" for most statements. > I'm not sure if the performance improvement justifies this > non-backwardly-compatible change, though I agree this is the > intention of the feature. What do you think? > > Regards, > René Pijlman > >
Can someone suggest what is to be done with the propsed patch? > > What exactly is the behaviour of the backend in that scenario? > > Does it commit every separate SQL statement in the > > semicolon-separated list, or does it commit the list as a whole? > > Does it abort processing the statement list when an error occurs > > in one statement? And if it continues, does it return an error > > when only one statement in the middle of the list had an error? > > I do not know what the server does if you have autocommit enabled and > you issue multiple statements in one try. However, I would be OK with > the driver issuing the statements one by one with autocommit on. If you > are running in this mode you just wouldn't get any performance improvement. > > > However, it would mean a change in behaviour of the driver that > > may break existing JDBC applications: the driver will no longer > > return update counts for all statements in a batch like it > > currently does, it will return "unknown" for most statements. > > I'm not sure if the performance improvement justifies this > > non-backwardly-compatible change, though I agree this is the > > intention of the feature. What do you think? > > I wouldn't worry about this 'change in behavior' because if the caller > is JDBC complient it should be coded to handle the new behavior as it is > complient with the spec. > > thanks, > --Barry > > > > > Rene Pijlman wrote: > > On Mon, 27 Aug 2001 11:07:55 -0700, you wrote: > > [executeBatch() implemented as one round trip] > > > >>Here is how I would suggest this be done in a way that is spec > >>compliant (Note: that I haven't looked at the patch you submited yet, so > >>forgive me if you have already done it this way, but based on your > >>comments in this email, my guess is that you have not). > >> > > > > Indeed, I have not implemented this. > > > > > >>Statements should be batched together in a single statement with > >>semicolons separating the individual statements (this will allow the > >>backend to process them all in one round trip). > >> > >>The result array should return an element with the row count for each > >>statement, however the value for all but the last statement will be > >>'-2'. (-2 is defined by the spec to mean the statement was processed > >>successfully but the number of affected rows is unknown). > >> > > > > Ah, I see. I hadn't thought of that solution. > > > > > >>In the event of an error, then the driver should return an array the > >>size of the submitted batch with values of -3 for all elements. -3 is > >>defined by the spec as the corresponding statement failed to execute > >>successfully, or for statements that could not be processed for some > >>reason. Since in postgres when one statement fails (in non-autocommit > >>mode), the entire transaction is aborted this is consistent with a > >>return value of -3 in my reading of the spec. > >> > > > > Not quite. A statement in a batch may also fail because its a > > succesful SELECT as far as the server is concerned (can't have > > select's in a batch). But that situation can also be handled > > correctly by setting the update count for that particular > > statement to -3. Its then up to the application to decide if it > > wants to rollback, I would say. > > > > But what to do when an error occurs with autocommit enabled? > > This is not recommended, but allowed by the spec, if I > > understand it correctly. > > > > What exactly is the behaviour of the backend in that scenario? > > Does it commit every separate SQL statement in the > > semicolon-separated list, or does it commit the list as a whole? > > Does it abort processing the statement list when an error occurs > > in one statement? And if it continues, does it return an error > > when only one statement in the middle of the list had an error? > > > > > >>I believe this approach makes the most sense because: > >>1) It implements batches in one round trip (the intention of the feature) > >>2) It is complient with the standard > >>3) It is complient with the current functionality of the backend > >> > > > > If we can come up with an acceptable solution for an error with > > autocommit enabled, I agree. Otherwise, I'm not sure. > > > > However, it would mean a change in behaviour of the driver that > > may break existing JDBC applications: the driver will no longer > > return update counts for all statements in a batch like it > > currently does, it will return "unknown" for most statements. > > I'm not sure if the performance improvement justifies this > > non-backwardly-compatible change, though I agree this is the > > intention of the feature. What do you think? > > > > Regards, > > Ren? Pijlman > > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Tue, 28 Aug 2001 12:31:53 -0400 (EDT), you wrote: >Can someone suggest what is to be done with the propsed patch? Barry's proposal for a more efficient implementation of executeBatch() does not invalidate the patch. It just means we still have more work to do to get the performance improvement this feature is all about. The patch improves JDBC compliance independent of the performance improvement, and it adds a test case. If there are no objections (haven't seen any) I'd suggest it is applied. Regards, René Pijlman
Bruce, I think the existing patch can be applied as is. The issues I raised below are further improvements in the functionality that can be done and don't directly relate to the patch that was submitted. Sorry if I confused things. --Barry Bruce Momjian wrote: > Can someone suggest what is to be done with the propsed patch? > > >> > What exactly is the behaviour of the backend in that scenario? >> > Does it commit every separate SQL statement in the >> > semicolon-separated list, or does it commit the list as a whole? >> > Does it abort processing the statement list when an error occurs >> > in one statement? And if it continues, does it return an error >> > when only one statement in the middle of the list had an error? >> >>I do not know what the server does if you have autocommit enabled and >>you issue multiple statements in one try. However, I would be OK with >>the driver issuing the statements one by one with autocommit on. If you >>are running in this mode you just wouldn't get any performance improvement. >> >> > However, it would mean a change in behaviour of the driver that >> > may break existing JDBC applications: the driver will no longer >> > return update counts for all statements in a batch like it >> > currently does, it will return "unknown" for most statements. >> > I'm not sure if the performance improvement justifies this >> > non-backwardly-compatible change, though I agree this is the >> > intention of the feature. What do you think? >> >>I wouldn't worry about this 'change in behavior' because if the caller >>is JDBC complient it should be coded to handle the new behavior as it is >>complient with the spec. >> >>thanks, >>--Barry >> >> >> >> >>Rene Pijlman wrote: >> >>>On Mon, 27 Aug 2001 11:07:55 -0700, you wrote: >>>[executeBatch() implemented as one round trip] >>> >>> >>>>Here is how I would suggest this be done in a way that is spec >>>>compliant (Note: that I haven't looked at the patch you submited yet, so >>>>forgive me if you have already done it this way, but based on your >>>>comments in this email, my guess is that you have not). >>>> >>>> >>>Indeed, I have not implemented this. >>> >>> >>> >>>>Statements should be batched together in a single statement with >>>>semicolons separating the individual statements (this will allow the >>>>backend to process them all in one round trip). >>>> >>>>The result array should return an element with the row count for each >>>>statement, however the value for all but the last statement will be >>>>'-2'. (-2 is defined by the spec to mean the statement was processed >>>>successfully but the number of affected rows is unknown). >>>> >>>> >>>Ah, I see. I hadn't thought of that solution. >>> >>> >>> >>>>In the event of an error, then the driver should return an array the >>>>size of the submitted batch with values of -3 for all elements. -3 is >>>>defined by the spec as the corresponding statement failed to execute >>>>successfully, or for statements that could not be processed for some >>>>reason. Since in postgres when one statement fails (in non-autocommit >>>>mode), the entire transaction is aborted this is consistent with a >>>>return value of -3 in my reading of the spec. >>>> >>>> >>>Not quite. A statement in a batch may also fail because its a >>>succesful SELECT as far as the server is concerned (can't have >>>select's in a batch). But that situation can also be handled >>>correctly by setting the update count for that particular >>>statement to -3. Its then up to the application to decide if it >>>wants to rollback, I would say. >>> >>>But what to do when an error occurs with autocommit enabled? >>>This is not recommended, but allowed by the spec, if I >>>understand it correctly. >>> >>>What exactly is the behaviour of the backend in that scenario? >>>Does it commit every separate SQL statement in the >>>semicolon-separated list, or does it commit the list as a whole? >>>Does it abort processing the statement list when an error occurs >>>in one statement? And if it continues, does it return an error >>>when only one statement in the middle of the list had an error? >>> >>> >>> >>>>I believe this approach makes the most sense because: >>>>1) It implements batches in one round trip (the intention of the feature) >>>>2) It is complient with the standard >>>>3) It is complient with the current functionality of the backend >>>> >>>> >>>If we can come up with an acceptable solution for an error with >>>autocommit enabled, I agree. Otherwise, I'm not sure. >>> >>>However, it would mean a change in behaviour of the driver that >>>may break existing JDBC applications: the driver will no longer >>>return update counts for all statements in a batch like it >>>currently does, it will return "unknown" for most statements. >>>I'm not sure if the performance improvement justifies this >>>non-backwardly-compatible change, though I agree this is the >>>intention of the feature. What do you think? >>> >>>Regards, >>>Ren? Pijlman >>> >>> >>> >> >> >>---------------------------(end of broadcast)--------------------------- >>TIP 3: if posting/reading through Usenet, please send an appropriate >>subscribe-nomail command to majordomo@postgresql.org so that your >>message can get through to the mailing list cleanly >> >> >
No problem. Just checking. Patch will remain in the queue and be applied. > Bruce, > > I think the existing patch can be applied as is. The issues I raised > below are further improvements in the functionality that can be done and > don't directly relate to the patch that was submitted. Sorry if I > confused things. > > --Barry > > Bruce Momjian wrote: > > Can someone suggest what is to be done with the propsed patch? > > > > > >> > What exactly is the behaviour of the backend in that scenario? > >> > Does it commit every separate SQL statement in the > >> > semicolon-separated list, or does it commit the list as a whole? > >> > Does it abort processing the statement list when an error occurs > >> > in one statement? And if it continues, does it return an error > >> > when only one statement in the middle of the list had an error? > >> > >>I do not know what the server does if you have autocommit enabled and > >>you issue multiple statements in one try. However, I would be OK with > >>the driver issuing the statements one by one with autocommit on. If you > >>are running in this mode you just wouldn't get any performance improvement. > >> > >> > However, it would mean a change in behaviour of the driver that > >> > may break existing JDBC applications: the driver will no longer > >> > return update counts for all statements in a batch like it > >> > currently does, it will return "unknown" for most statements. > >> > I'm not sure if the performance improvement justifies this > >> > non-backwardly-compatible change, though I agree this is the > >> > intention of the feature. What do you think? > >> > >>I wouldn't worry about this 'change in behavior' because if the caller > >>is JDBC complient it should be coded to handle the new behavior as it is > >>complient with the spec. > >> > >>thanks, > >>--Barry > >> > >> > >> > >> > >>Rene Pijlman wrote: > >> > >>>On Mon, 27 Aug 2001 11:07:55 -0700, you wrote: > >>>[executeBatch() implemented as one round trip] > >>> > >>> > >>>>Here is how I would suggest this be done in a way that is spec > >>>>compliant (Note: that I haven't looked at the patch you submited yet, so > >>>>forgive me if you have already done it this way, but based on your > >>>>comments in this email, my guess is that you have not). > >>>> > >>>> > >>>Indeed, I have not implemented this. > >>> > >>> > >>> > >>>>Statements should be batched together in a single statement with > >>>>semicolons separating the individual statements (this will allow the > >>>>backend to process them all in one round trip). > >>>> > >>>>The result array should return an element with the row count for each > >>>>statement, however the value for all but the last statement will be > >>>>'-2'. (-2 is defined by the spec to mean the statement was processed > >>>>successfully but the number of affected rows is unknown). > >>>> > >>>> > >>>Ah, I see. I hadn't thought of that solution. > >>> > >>> > >>> > >>>>In the event of an error, then the driver should return an array the > >>>>size of the submitted batch with values of -3 for all elements. -3 is > >>>>defined by the spec as the corresponding statement failed to execute > >>>>successfully, or for statements that could not be processed for some > >>>>reason. Since in postgres when one statement fails (in non-autocommit > >>>>mode), the entire transaction is aborted this is consistent with a > >>>>return value of -3 in my reading of the spec. > >>>> > >>>> > >>>Not quite. A statement in a batch may also fail because its a > >>>succesful SELECT as far as the server is concerned (can't have > >>>select's in a batch). But that situation can also be handled > >>>correctly by setting the update count for that particular > >>>statement to -3. Its then up to the application to decide if it > >>>wants to rollback, I would say. > >>> > >>>But what to do when an error occurs with autocommit enabled? > >>>This is not recommended, but allowed by the spec, if I > >>>understand it correctly. > >>> > >>>What exactly is the behaviour of the backend in that scenario? > >>>Does it commit every separate SQL statement in the > >>>semicolon-separated list, or does it commit the list as a whole? > >>>Does it abort processing the statement list when an error occurs > >>>in one statement? And if it continues, does it return an error > >>>when only one statement in the middle of the list had an error? > >>> > >>> > >>> > >>>>I believe this approach makes the most sense because: > >>>>1) It implements batches in one round trip (the intention of the feature) > >>>>2) It is complient with the standard > >>>>3) It is complient with the current functionality of the backend > >>>> > >>>> > >>>If we can come up with an acceptable solution for an error with > >>>autocommit enabled, I agree. Otherwise, I'm not sure. > >>> > >>>However, it would mean a change in behaviour of the driver that > >>>may break existing JDBC applications: the driver will no longer > >>>return update counts for all statements in a batch like it > >>>currently does, it will return "unknown" for most statements. > >>>I'm not sure if the performance improvement justifies this > >>>non-backwardly-compatible change, though I agree this is the > >>>intention of the feature. What do you think? > >>> > >>>Regards, > >>>Ren? Pijlman > >>> > >>> > >>> > >> > >> > >>---------------------------(end of broadcast)--------------------------- > >>TIP 3: if posting/reading through Usenet, please send an appropriate > >>subscribe-nomail command to majordomo@postgresql.org so that your > >>message can get through to the mailing list cleanly > >> > >> > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Mon, 27 Aug 2001 22:57:13 -0700, Barry Lind wrote: >I do not know what the server does if you have autocommit enabled >and you issue multiple statements in one try. As you know, Peter Eisentraut said on hackers that all statements in a semicolon-separated query string are processed as one single transaction. So, if in "S1;S2;S3" S2 fails, both S1, S2 and S3 are rolled back and in effect they have all failed. I think this means we can implement your proposal. I've removed the paragraph saying that we can't and I've added the following to http://lab.applinet.nl/postgresql-jdbc/#Batch -+-+- The current implementation of Statement.executeBatch() in the JDBC driver does not provide any performance improvement compared to processing statements individually. This is because the driver executes the statements one-by-one when executeBatch() is called, using one round trip per statement. We intend to reimplement executeBatch() in the following way. Statement.executeBatch() will send all statements in a single semicolon separated query string, with only one round trip to the backend. This will provide a performance improvement, as intended by the JDBC specification. The updateCounts array will be set as described below. Note that the JDBC spec defines the meaning of the following special values: -2 the statement was executed successfully but the number of affected rows is unknown -3 the statement failed to execute successfully If all statements succeed, executeBatch() returns an updateCounts array with a row count for each statement in the batch, however the value for all but the last statement will be -2. The value for the last statement will be a proper update count. If a statement fails, executeBatch() throws a BatchUpdateException containing an updateCounts array with a row count of -3 for each statement in the batch. Note that the behaviour will be the same when autocommit is enabled and when it is disabled. Even with autocommit enabled, the backend will commit or rollback all statements in the semicolon-separated query string as a unit. The new implementation of executeBatch() will cause a change in behaviour of the driver: the driver will no longer return update counts for all statements in a batch like it currently does, it will return -2 ("unknown") for most statements. However, this behaviour is allowed by the JDBC spec and applications should be prepared to handle it. -+-+- I see two more issues we need to decide on... 1) The JDBC spec requires Statement.executeBatch() to throw a BatchUpdateException if any of the statements does not return an update count (e.g. is a SELECT). How can we implement this? Do we need to parse the statements in the JDBC driver to detect SELECT's? It is a matter of interpretation, but it seems OK to me to just ignore this and return -2/-3 for SELECT's as well. In fact, perhaps we should allow SELECT's for function calls!? 2) The reimplementation may cause the driver to send very long statements to the backend. I heard something about an 8K limit. In what version of the backend was this limitation removed? I guess we should implement the new algorithm conditionally, so we'll only send multi-statement query strings to a backend that has no statement length limitation. Regards, René Pijlman <rene@lab.applinet.nl>
Rene, This writeup looks good. As for your two followup questions: 1) I agree with your opinion on the select issue. Since selects are the only way to call functions and functions could themselves be doing inserts/updates, to not allow selects is a bad idea IMHO. 2) The 8K limitation was removed from the database in 7.0 and from the JDBC driver in 7.0.2. Therefore I don't think we should code for the case of a 7.2 JDBC driver needing to support a 6.5 database. thanks, --Barry Rene Pijlman wrote: > On Mon, 27 Aug 2001 22:57:13 -0700, Barry Lind wrote: > >>I do not know what the server does if you have autocommit enabled >>and you issue multiple statements in one try. >> > > As you know, Peter Eisentraut said on hackers that all > statements in a semicolon-separated query string are processed > as one single transaction. So, if in "S1;S2;S3" S2 fails, both > S1, S2 and S3 are rolled back and in effect they have all > failed. > > I think this means we can implement your proposal. I've removed > the paragraph saying that we can't and I've added the following > to http://lab.applinet.nl/postgresql-jdbc/#Batch > > -+-+- > > The current implementation of Statement.executeBatch() in the > JDBC driver does not provide any performance improvement > compared to processing statements individually. This is because > the driver executes the statements one-by-one when > executeBatch() is called, using one round trip per statement. We > intend to reimplement executeBatch() in the following way. > > Statement.executeBatch() will send all statements in a single > semicolon separated query string, with only one round trip to > the backend. This will provide a performance improvement, as > intended by the JDBC specification. > > The updateCounts array will be set as described below. Note that > the JDBC spec defines the meaning of the following special > values: > > -2 the statement was executed successfully but the number of > affected rows is unknown > -3 the statement failed to execute successfully > > If all statements succeed, executeBatch() returns an > updateCounts array with a row count for each statement in the > batch, however the value for all but the last statement will be > -2. The value for the last statement will be a proper update > count. > If a statement fails, executeBatch() throws a > BatchUpdateException containing an updateCounts array with a row > count of -3 for each statement in the batch. > > Note that the behaviour will be the same when autocommit is > enabled and when it is disabled. Even with autocommit enabled, > the backend will commit or rollback all statements in the > semicolon-separated query string as a unit. > > The new implementation of executeBatch() will cause a change in > behaviour of the driver: the driver will no longer return update > counts for all statements in a batch like it currently does, it > will return -2 ("unknown") for most statements. However, this > behaviour is allowed by the JDBC spec and applications should be > prepared to handle it. > > -+-+- > > I see two more issues we need to decide on... > > 1) The JDBC spec requires Statement.executeBatch() to throw a > BatchUpdateException if any of the statements does not return an > update count (e.g. is a SELECT). How can we implement this? Do > we need to parse the statements in the JDBC driver to detect > SELECT's? It is a matter of interpretation, but it seems OK to > me to just ignore this and return -2/-3 for SELECT's as well. In > fact, perhaps we should allow SELECT's for function calls!? > > 2) The reimplementation may cause the driver to send very long > statements to the backend. I heard something about an 8K limit. > In what version of the backend was this limitation removed? I > guess we should implement the new algorithm conditionally, so > we'll only send multi-statement query strings to a backend that > has no statement length limitation. > > Regards, > René Pijlman <rene@lab.applinet.nl> >
Barry, I just applied this patch and a few others. Let me know if they are OK. > Rene, > > This writeup looks good. As for your two followup questions: > > 1) I agree with your opinion on the select issue. Since selects are > the only way to call functions and functions could themselves be doing > inserts/updates, to not allow selects is a bad idea IMHO. > > 2) The 8K limitation was removed from the database in 7.0 and from the > JDBC driver in 7.0.2. Therefore I don't think we should code for the > case of a 7.2 JDBC driver needing to support a 6.5 database. > > thanks, > --Barry > > > Rene Pijlman wrote: > > On Mon, 27 Aug 2001 22:57:13 -0700, Barry Lind wrote: > > > >>I do not know what the server does if you have autocommit enabled > >>and you issue multiple statements in one try. > >> > > > > As you know, Peter Eisentraut said on hackers that all > > statements in a semicolon-separated query string are processed > > as one single transaction. So, if in "S1;S2;S3" S2 fails, both > > S1, S2 and S3 are rolled back and in effect they have all > > failed. > > > > I think this means we can implement your proposal. I've removed > > the paragraph saying that we can't and I've added the following > > to http://lab.applinet.nl/postgresql-jdbc/#Batch > > > > -+-+- > > > > The current implementation of Statement.executeBatch() in the > > JDBC driver does not provide any performance improvement > > compared to processing statements individually. This is because > > the driver executes the statements one-by-one when > > executeBatch() is called, using one round trip per statement. We > > intend to reimplement executeBatch() in the following way. > > > > Statement.executeBatch() will send all statements in a single > > semicolon separated query string, with only one round trip to > > the backend. This will provide a performance improvement, as > > intended by the JDBC specification. > > > > The updateCounts array will be set as described below. Note that > > the JDBC spec defines the meaning of the following special > > values: > > > > -2 the statement was executed successfully but the number of > > affected rows is unknown > > -3 the statement failed to execute successfully > > > > If all statements succeed, executeBatch() returns an > > updateCounts array with a row count for each statement in the > > batch, however the value for all but the last statement will be > > -2. The value for the last statement will be a proper update > > count. > > If a statement fails, executeBatch() throws a > > BatchUpdateException containing an updateCounts array with a row > > count of -3 for each statement in the batch. > > > > Note that the behaviour will be the same when autocommit is > > enabled and when it is disabled. Even with autocommit enabled, > > the backend will commit or rollback all statements in the > > semicolon-separated query string as a unit. > > > > The new implementation of executeBatch() will cause a change in > > behaviour of the driver: the driver will no longer return update > > counts for all statements in a batch like it currently does, it > > will return -2 ("unknown") for most statements. However, this > > behaviour is allowed by the JDBC spec and applications should be > > prepared to handle it. > > > > -+-+- > > > > I see two more issues we need to decide on... > > > > 1) The JDBC spec requires Statement.executeBatch() to throw a > > BatchUpdateException if any of the statements does not return an > > update count (e.g. is a SELECT). How can we implement this? Do > > we need to parse the statements in the JDBC driver to detect > > SELECT's? It is a matter of interpretation, but it seems OK to > > me to just ignore this and return -2/-3 for SELECT's as well. In > > fact, perhaps we should allow SELECT's for function calls!? > > > > 2) The reimplementation may cause the driver to send very long > > statements to the backend. I heard something about an 8K limit. > > In what version of the backend was this limitation removed? I > > guess we should implement the new algorithm conditionally, so > > we'll only send multi-statement query strings to a backend that > > has no statement length limitation. > > > > Regards, > > Ren? Pijlman <rene@lab.applinet.nl> > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026