Thread: Proposal to fix Statement.executeBatch()

Proposal to fix Statement.executeBatch()

From
Rene Pijlman
Date:
I've finished the secion on batch updates in the JDBC 2.0
compliance documentation on
http://lab.applinet.nl/postgresql-jdbc/ (see the quote of the
relevant part below).

In the short term I think two things need to be fixed:
1) don't begin, commit or rollback a transaction implicitly in
Statement.executeBatch()
2) have executeBatch() throw a BatchUpdateException when it is
required to do so by the JDBC spec

If there are no objections from this list I intend to submit a
patch that fixes 1), and perhaps also 2).

Note that this may cause backward compatibility issues with JDBC
applications that have come to rely on the incorrect behaviour.
OTOH, there have been complaints on this list before, and those
people would certainly be happy about the fix. E.g.
http://fts.postgresql.org/db/mw/msg.html?mid=83832

In the long run it would be nice if the backend would support
returning one update count (and perhaps an OID) per statement
send in a semicolon separated multi-statement call. Would this
be something for the backend TODO list? OTOH, I'm not sure if
this (small?) performance improvement is worth the trouble.

"Batch updates

The driver supports batch updates with the addBatch, clearBatch
and executeBatch methods of Statement, PreparedStatement and
CallableStatement. DatabaseMetaData.supportsBatchUpdates()
returns true.

However, executing statements in a batch does not provide a
performance improvement with PostgreSQL, since all statements
are internally send to the backend and processed one-by-one.
That defeats the purpose of the batch methods. The intended
behaviour is to send a set of update/insert/delete/DDL
statements in one round trip to the database. Unfortunately,
this optional JDBC feature cannot be implemented correctly with
PostgreSQL, since the backend only returns the update count of
the last statement send in one call with multiple statements.
JDBC requires it to return an array with the update counts of
all statements in the batch. Even though the batch processing
feature currently provides no performance improvement, it should
not be removed from the driver for reasons of backward
compatibility.

The current implementation of Statement.executeBatch() in
PostgreSQL starts a new transaction and commits or aborts it.
This is not in compliance with the JDBC specification, which
does not mention transactions in the description of
Statement.executeBatch() at all. The confusion is probably
caused by a JDBC tutorial from Sun with example code which
disables autocommit before calling executeBatch "so that the
transaction will not be automatically committed or rolled back
when the method executeBatch is called". This comment in the
tutorials appears to be a misunderstanding. A good reason to
disable autocommit before calling executeUpdate() is to be able
to commit or rollback all statements in a batch as a unit. With
autocommit enabled, the application would not know which
statements had and had not been processed when an exception is
thrown. It is the responsibility of the application, however, to
disable autocommit and to commit or rollback a transaction. Note
that Oracle's implementation of executeBatch() also does not
commit or rollback a transaction implicitly. The implementation
of Statement.executeBatch() in PostgreSQL should be changed to
not begin, commit or rollback a transaction.

Support for BatchUpdateException is not yet implemented. The
implementation of executeBatch is incorrect, therefore, since it
is required to throw a BatchUpdateException if one of the
commands in the batch returns something other than an update
count."

Regards,
René Pijlman

Re: Proposal to fix Statement.executeBatch()

From
Barry Lind
Date:
Rene,

I see your statements below as incorrect:

 > The intended behaviour is to send a set of update/insert/delete/DDL
 > statements in one round trip to the database. Unfortunately,
 > this optional JDBC feature cannot be implemented correctly with
 > PostgreSQL, since the backend only returns the update count of
 > the last statement send in one call with multiple statements.
 > JDBC requires it to return an array with the update counts of
 > all statements in the batch.

The intended behaviour is certainly to send all of the statements in one
round trip.  And the JDBC2.1 spec certainly allows postgres to do just
that.  Here is how I would suggest this be done in a way that is spec
compliant (Note: that I haven't looked at the patch you submited yet, so
forgive me if you have already done it this way, but based on your
comments in this email, my guess is that you have not).


Statements should be batched together in a single statement with
semicolons separating the individual statements (this will allow the
backend to process them all in one round trip).

The result array should return an element with the row count for each
statement, however the value for all but the last statement will be
'-2'.  (-2 is defined by the spec to mean the statement was processed
successfully but the number of affected rows is unknown).

In the event of an error, then the driver should return an array the
size of the submitted batch with values of -3 for all elements. -3 is
defined by the spec as the corresponding statement failed to execute
successfully, or for statements that could not be processed for some
reason.  Since in postgres when one statement fails (in non-autocommit
mode), the entire transaction is aborted this is consistent with a
return value of -3 in my reading of the spec.

I believe this approach makes the most sense because:
1) It implements batches in one round trip (the intention of the feature)
2) It is complient with the standard
3) It is complient with the current functionality of the backend

thanks,
--Barry


Rene Pijlman wrote:
> I've finished the secion on batch updates in the JDBC 2.0
> compliance documentation on
> http://lab.applinet.nl/postgresql-jdbc/ (see the quote of the
> relevant part below).
>
> In the short term I think two things need to be fixed:
> 1) don't begin, commit or rollback a transaction implicitly in
> Statement.executeBatch()
> 2) have executeBatch() throw a BatchUpdateException when it is
> required to do so by the JDBC spec
>
> If there are no objections from this list I intend to submit a
> patch that fixes 1), and perhaps also 2).
>
> Note that this may cause backward compatibility issues with JDBC
> applications that have come to rely on the incorrect behaviour.
> OTOH, there have been complaints on this list before, and those
> people would certainly be happy about the fix. E.g.
> http://fts.postgresql.org/db/mw/msg.html?mid=83832
>
> In the long run it would be nice if the backend would support
> returning one update count (and perhaps an OID) per statement
> send in a semicolon separated multi-statement call. Would this
> be something for the backend TODO list? OTOH, I'm not sure if
> this (small?) performance improvement is worth the trouble.
>
> "Batch updates
>
> The driver supports batch updates with the addBatch, clearBatch
> and executeBatch methods of Statement, PreparedStatement and
> CallableStatement. DatabaseMetaData.supportsBatchUpdates()
> returns true.
>
> However, executing statements in a batch does not provide a
> performance improvement with PostgreSQL, since all statements
> are internally send to the backend and processed one-by-one.
> That defeats the purpose of the batch methods. The intended
> behaviour is to send a set of update/insert/delete/DDL
> statements in one round trip to the database. Unfortunately,
> this optional JDBC feature cannot be implemented correctly with
> PostgreSQL, since the backend only returns the update count of
> the last statement send in one call with multiple statements.
> JDBC requires it to return an array with the update counts of
> all statements in the batch. Even though the batch processing
> feature currently provides no performance improvement, it should
> not be removed from the driver for reasons of backward
> compatibility.
>
> The current implementation of Statement.executeBatch() in
> PostgreSQL starts a new transaction and commits or aborts it.
> This is not in compliance with the JDBC specification, which
> does not mention transactions in the description of
> Statement.executeBatch() at all. The confusion is probably
> caused by a JDBC tutorial from Sun with example code which
> disables autocommit before calling executeBatch "so that the
> transaction will not be automatically committed or rolled back
> when the method executeBatch is called". This comment in the
> tutorials appears to be a misunderstanding. A good reason to
> disable autocommit before calling executeUpdate() is to be able
> to commit or rollback all statements in a batch as a unit. With
> autocommit enabled, the application would not know which
> statements had and had not been processed when an exception is
> thrown. It is the responsibility of the application, however, to
> disable autocommit and to commit or rollback a transaction. Note
> that Oracle's implementation of executeBatch() also does not
> commit or rollback a transaction implicitly. The implementation
> of Statement.executeBatch() in PostgreSQL should be changed to
> not begin, commit or rollback a transaction.
>
> Support for BatchUpdateException is not yet implemented. The
> implementation of executeBatch is incorrect, therefore, since it
> is required to throw a BatchUpdateException if one of the
> commands in the batch returns something other than an update
> count."
>
> Regards,
> René Pijlman
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>



Re: Re: Proposal to fix Statement.executeBatch()

From
Rene Pijlman
Date:
On Mon, 27 Aug 2001 11:07:55 -0700, you wrote:
[executeBatch() implemented as one round trip]
>Here is how I would suggest this be done in a way that is spec
>compliant (Note: that I haven't looked at the patch you submited yet, so
>forgive me if you have already done it this way, but based on your
>comments in this email, my guess is that you have not).

Indeed, I have not implemented this.

>Statements should be batched together in a single statement with
>semicolons separating the individual statements (this will allow the
>backend to process them all in one round trip).
>
>The result array should return an element with the row count for each
>statement, however the value for all but the last statement will be
>'-2'.  (-2 is defined by the spec to mean the statement was processed
>successfully but the number of affected rows is unknown).

Ah, I see. I hadn't thought of that solution.

>In the event of an error, then the driver should return an array the
>size of the submitted batch with values of -3 for all elements. -3 is
>defined by the spec as the corresponding statement failed to execute
>successfully, or for statements that could not be processed for some
>reason.  Since in postgres when one statement fails (in non-autocommit
>mode), the entire transaction is aborted this is consistent with a
>return value of -3 in my reading of the spec.

Not quite. A statement in a batch may also fail because its a
succesful SELECT as far as the server is concerned (can't have
select's in a batch). But that situation can also be handled
correctly by setting the update count for that particular
statement to -3. Its then up to the application to decide if it
wants to rollback, I would say.

But what to do when an error occurs with autocommit enabled?
This is not recommended, but allowed by the spec, if I
understand it correctly.

What exactly is the behaviour of the backend in that scenario?
Does it commit every separate SQL statement in the
semicolon-separated list, or does it commit the list as a whole?
Does it abort processing the statement list when an error occurs
in one statement? And if it continues, does it return an error
when only one statement in the middle of the list had an error?

>I believe this approach makes the most sense because:
>1) It implements batches in one round trip (the intention of the feature)
>2) It is complient with the standard
>3) It is complient with the current functionality of the backend

If we can come up with an acceptable solution for an error with
autocommit enabled, I agree. Otherwise, I'm not sure.

However, it would mean a change in behaviour of the driver that
may break existing JDBC applications: the driver will no longer
return update counts for all statements in a batch like it
currently does, it will return "unknown" for most statements.
I'm not sure if the performance improvement justifies this
non-backwardly-compatible change, though I agree this is the
intention of the feature. What do you think?

Regards,
René Pijlman

Re: Re: Proposal to fix Statement.executeBatch()

From
Barry Lind
Date:
 > What exactly is the behaviour of the backend in that scenario?
 > Does it commit every separate SQL statement in the
 > semicolon-separated list, or does it commit the list as a whole?
 > Does it abort processing the statement list when an error occurs
 > in one statement? And if it continues, does it return an error
 > when only one statement in the middle of the list had an error?

I do not know what the server does if you have autocommit enabled and
you issue multiple statements in one try.  However, I would be OK with
the driver issuing the statements one by one with autocommit on.  If you
are running in this mode you just wouldn't get any performance improvement.

 > However, it would mean a change in behaviour of the driver that
 > may break existing JDBC applications: the driver will no longer
 > return update counts for all statements in a batch like it
 > currently does, it will return "unknown" for most statements.
 > I'm not sure if the performance improvement justifies this
 > non-backwardly-compatible change, though I agree this is the
 > intention of the feature. What do you think?

I wouldn't worry about this 'change in behavior' because if the caller
is JDBC complient it should be coded to handle the new behavior as it is
complient with the spec.

thanks,
--Barry




Rene Pijlman wrote:
> On Mon, 27 Aug 2001 11:07:55 -0700, you wrote:
> [executeBatch() implemented as one round trip]
>
>>Here is how I would suggest this be done in a way that is spec
>>compliant (Note: that I haven't looked at the patch you submited yet, so
>>forgive me if you have already done it this way, but based on your
>>comments in this email, my guess is that you have not).
>>
>
> Indeed, I have not implemented this.
>
>
>>Statements should be batched together in a single statement with
>>semicolons separating the individual statements (this will allow the
>>backend to process them all in one round trip).
>>
>>The result array should return an element with the row count for each
>>statement, however the value for all but the last statement will be
>>'-2'.  (-2 is defined by the spec to mean the statement was processed
>>successfully but the number of affected rows is unknown).
>>
>
> Ah, I see. I hadn't thought of that solution.
>
>
>>In the event of an error, then the driver should return an array the
>>size of the submitted batch with values of -3 for all elements. -3 is
>>defined by the spec as the corresponding statement failed to execute
>>successfully, or for statements that could not be processed for some
>>reason.  Since in postgres when one statement fails (in non-autocommit
>>mode), the entire transaction is aborted this is consistent with a
>>return value of -3 in my reading of the spec.
>>
>
> Not quite. A statement in a batch may also fail because its a
> succesful SELECT as far as the server is concerned (can't have
> select's in a batch). But that situation can also be handled
> correctly by setting the update count for that particular
> statement to -3. Its then up to the application to decide if it
> wants to rollback, I would say.
>
> But what to do when an error occurs with autocommit enabled?
> This is not recommended, but allowed by the spec, if I
> understand it correctly.
>
> What exactly is the behaviour of the backend in that scenario?
> Does it commit every separate SQL statement in the
> semicolon-separated list, or does it commit the list as a whole?
> Does it abort processing the statement list when an error occurs
> in one statement? And if it continues, does it return an error
> when only one statement in the middle of the list had an error?
>
>
>>I believe this approach makes the most sense because:
>>1) It implements batches in one round trip (the intention of the feature)
>>2) It is complient with the standard
>>3) It is complient with the current functionality of the backend
>>
>
> If we can come up with an acceptable solution for an error with
> autocommit enabled, I agree. Otherwise, I'm not sure.
>
> However, it would mean a change in behaviour of the driver that
> may break existing JDBC applications: the driver will no longer
> return update counts for all statements in a batch like it
> currently does, it will return "unknown" for most statements.
> I'm not sure if the performance improvement justifies this
> non-backwardly-compatible change, though I agree this is the
> intention of the feature. What do you think?
>
> Regards,
> René Pijlman
>
>



Re: Re: Proposal to fix Statement.executeBatch()

From
Bruce Momjian
Date:
Can someone suggest what is to be done with the propsed patch?

>  > What exactly is the behaviour of the backend in that scenario?
>  > Does it commit every separate SQL statement in the
>  > semicolon-separated list, or does it commit the list as a whole?
>  > Does it abort processing the statement list when an error occurs
>  > in one statement? And if it continues, does it return an error
>  > when only one statement in the middle of the list had an error?
>
> I do not know what the server does if you have autocommit enabled and
> you issue multiple statements in one try.  However, I would be OK with
> the driver issuing the statements one by one with autocommit on.  If you
> are running in this mode you just wouldn't get any performance improvement.
>
>  > However, it would mean a change in behaviour of the driver that
>  > may break existing JDBC applications: the driver will no longer
>  > return update counts for all statements in a batch like it
>  > currently does, it will return "unknown" for most statements.
>  > I'm not sure if the performance improvement justifies this
>  > non-backwardly-compatible change, though I agree this is the
>  > intention of the feature. What do you think?
>
> I wouldn't worry about this 'change in behavior' because if the caller
> is JDBC complient it should be coded to handle the new behavior as it is
> complient with the spec.
>
> thanks,
> --Barry
>
>
>
>
> Rene Pijlman wrote:
> > On Mon, 27 Aug 2001 11:07:55 -0700, you wrote:
> > [executeBatch() implemented as one round trip]
> >
> >>Here is how I would suggest this be done in a way that is spec
> >>compliant (Note: that I haven't looked at the patch you submited yet, so
> >>forgive me if you have already done it this way, but based on your
> >>comments in this email, my guess is that you have not).
> >>
> >
> > Indeed, I have not implemented this.
> >
> >
> >>Statements should be batched together in a single statement with
> >>semicolons separating the individual statements (this will allow the
> >>backend to process them all in one round trip).
> >>
> >>The result array should return an element with the row count for each
> >>statement, however the value for all but the last statement will be
> >>'-2'.  (-2 is defined by the spec to mean the statement was processed
> >>successfully but the number of affected rows is unknown).
> >>
> >
> > Ah, I see. I hadn't thought of that solution.
> >
> >
> >>In the event of an error, then the driver should return an array the
> >>size of the submitted batch with values of -3 for all elements. -3 is
> >>defined by the spec as the corresponding statement failed to execute
> >>successfully, or for statements that could not be processed for some
> >>reason.  Since in postgres when one statement fails (in non-autocommit
> >>mode), the entire transaction is aborted this is consistent with a
> >>return value of -3 in my reading of the spec.
> >>
> >
> > Not quite. A statement in a batch may also fail because its a
> > succesful SELECT as far as the server is concerned (can't have
> > select's in a batch). But that situation can also be handled
> > correctly by setting the update count for that particular
> > statement to -3. Its then up to the application to decide if it
> > wants to rollback, I would say.
> >
> > But what to do when an error occurs with autocommit enabled?
> > This is not recommended, but allowed by the spec, if I
> > understand it correctly.
> >
> > What exactly is the behaviour of the backend in that scenario?
> > Does it commit every separate SQL statement in the
> > semicolon-separated list, or does it commit the list as a whole?
> > Does it abort processing the statement list when an error occurs
> > in one statement? And if it continues, does it return an error
> > when only one statement in the middle of the list had an error?
> >
> >
> >>I believe this approach makes the most sense because:
> >>1) It implements batches in one round trip (the intention of the feature)
> >>2) It is complient with the standard
> >>3) It is complient with the current functionality of the backend
> >>
> >
> > If we can come up with an acceptable solution for an error with
> > autocommit enabled, I agree. Otherwise, I'm not sure.
> >
> > However, it would mean a change in behaviour of the driver that
> > may break existing JDBC applications: the driver will no longer
> > return update counts for all statements in a batch like it
> > currently does, it will return "unknown" for most statements.
> > I'm not sure if the performance improvement justifies this
> > non-backwardly-compatible change, though I agree this is the
> > intention of the feature. What do you think?
> >
> > Regards,
> > Ren? Pijlman
> >
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Re: Proposal to fix Statement.executeBatch()

From
Rene Pijlman
Date:
On Tue, 28 Aug 2001 12:31:53 -0400 (EDT), you wrote:
>Can someone suggest what is to be done with the propsed patch?

Barry's proposal for a more efficient implementation of
executeBatch() does not invalidate the patch. It just means we
still have more work to do to get the performance improvement
this feature is all about.

The patch improves JDBC compliance independent of the
performance improvement, and it adds a test case. If there are
no objections (haven't seen any) I'd suggest it is applied.

Regards,
René Pijlman

Re: Proposal to fix Statement.executeBatch()

From
Barry Lind
Date:
Bruce,

I think the existing patch can be applied as is.  The issues I raised
below are further improvements in the functionality that can be done and
don't directly relate to the patch that was submitted.  Sorry if I
confused things.

--Barry

Bruce Momjian wrote:
> Can someone suggest what is to be done with the propsed patch?
>
>
>> > What exactly is the behaviour of the backend in that scenario?
>> > Does it commit every separate SQL statement in the
>> > semicolon-separated list, or does it commit the list as a whole?
>> > Does it abort processing the statement list when an error occurs
>> > in one statement? And if it continues, does it return an error
>> > when only one statement in the middle of the list had an error?
>>
>>I do not know what the server does if you have autocommit enabled and
>>you issue multiple statements in one try.  However, I would be OK with
>>the driver issuing the statements one by one with autocommit on.  If you
>>are running in this mode you just wouldn't get any performance improvement.
>>
>> > However, it would mean a change in behaviour of the driver that
>> > may break existing JDBC applications: the driver will no longer
>> > return update counts for all statements in a batch like it
>> > currently does, it will return "unknown" for most statements.
>> > I'm not sure if the performance improvement justifies this
>> > non-backwardly-compatible change, though I agree this is the
>> > intention of the feature. What do you think?
>>
>>I wouldn't worry about this 'change in behavior' because if the caller
>>is JDBC complient it should be coded to handle the new behavior as it is
>>complient with the spec.
>>
>>thanks,
>>--Barry
>>
>>
>>
>>
>>Rene Pijlman wrote:
>>
>>>On Mon, 27 Aug 2001 11:07:55 -0700, you wrote:
>>>[executeBatch() implemented as one round trip]
>>>
>>>
>>>>Here is how I would suggest this be done in a way that is spec
>>>>compliant (Note: that I haven't looked at the patch you submited yet, so
>>>>forgive me if you have already done it this way, but based on your
>>>>comments in this email, my guess is that you have not).
>>>>
>>>>
>>>Indeed, I have not implemented this.
>>>
>>>
>>>
>>>>Statements should be batched together in a single statement with
>>>>semicolons separating the individual statements (this will allow the
>>>>backend to process them all in one round trip).
>>>>
>>>>The result array should return an element with the row count for each
>>>>statement, however the value for all but the last statement will be
>>>>'-2'.  (-2 is defined by the spec to mean the statement was processed
>>>>successfully but the number of affected rows is unknown).
>>>>
>>>>
>>>Ah, I see. I hadn't thought of that solution.
>>>
>>>
>>>
>>>>In the event of an error, then the driver should return an array the
>>>>size of the submitted batch with values of -3 for all elements. -3 is
>>>>defined by the spec as the corresponding statement failed to execute
>>>>successfully, or for statements that could not be processed for some
>>>>reason.  Since in postgres when one statement fails (in non-autocommit
>>>>mode), the entire transaction is aborted this is consistent with a
>>>>return value of -3 in my reading of the spec.
>>>>
>>>>
>>>Not quite. A statement in a batch may also fail because its a
>>>succesful SELECT as far as the server is concerned (can't have
>>>select's in a batch). But that situation can also be handled
>>>correctly by setting the update count for that particular
>>>statement to -3. Its then up to the application to decide if it
>>>wants to rollback, I would say.
>>>
>>>But what to do when an error occurs with autocommit enabled?
>>>This is not recommended, but allowed by the spec, if I
>>>understand it correctly.
>>>
>>>What exactly is the behaviour of the backend in that scenario?
>>>Does it commit every separate SQL statement in the
>>>semicolon-separated list, or does it commit the list as a whole?
>>>Does it abort processing the statement list when an error occurs
>>>in one statement? And if it continues, does it return an error
>>>when only one statement in the middle of the list had an error?
>>>
>>>
>>>
>>>>I believe this approach makes the most sense because:
>>>>1) It implements batches in one round trip (the intention of the feature)
>>>>2) It is complient with the standard
>>>>3) It is complient with the current functionality of the backend
>>>>
>>>>
>>>If we can come up with an acceptable solution for an error with
>>>autocommit enabled, I agree. Otherwise, I'm not sure.
>>>
>>>However, it would mean a change in behaviour of the driver that
>>>may break existing JDBC applications: the driver will no longer
>>>return update counts for all statements in a batch like it
>>>currently does, it will return "unknown" for most statements.
>>>I'm not sure if the performance improvement justifies this
>>>non-backwardly-compatible change, though I agree this is the
>>>intention of the feature. What do you think?
>>>
>>>Regards,
>>>Ren? Pijlman
>>>
>>>
>>>
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 3: if posting/reading through Usenet, please send an appropriate
>>subscribe-nomail command to majordomo@postgresql.org so that your
>>message can get through to the mailing list cleanly
>>
>>
>



Re: Re: Proposal to fix Statement.executeBatch()

From
Bruce Momjian
Date:
No problem.  Just checking.  Patch will remain in the queue and be
applied.

> Bruce,
>
> I think the existing patch can be applied as is.  The issues I raised
> below are further improvements in the functionality that can be done and
> don't directly relate to the patch that was submitted.  Sorry if I
> confused things.
>
> --Barry
>
> Bruce Momjian wrote:
> > Can someone suggest what is to be done with the propsed patch?
> >
> >
> >> > What exactly is the behaviour of the backend in that scenario?
> >> > Does it commit every separate SQL statement in the
> >> > semicolon-separated list, or does it commit the list as a whole?
> >> > Does it abort processing the statement list when an error occurs
> >> > in one statement? And if it continues, does it return an error
> >> > when only one statement in the middle of the list had an error?
> >>
> >>I do not know what the server does if you have autocommit enabled and
> >>you issue multiple statements in one try.  However, I would be OK with
> >>the driver issuing the statements one by one with autocommit on.  If you
> >>are running in this mode you just wouldn't get any performance improvement.
> >>
> >> > However, it would mean a change in behaviour of the driver that
> >> > may break existing JDBC applications: the driver will no longer
> >> > return update counts for all statements in a batch like it
> >> > currently does, it will return "unknown" for most statements.
> >> > I'm not sure if the performance improvement justifies this
> >> > non-backwardly-compatible change, though I agree this is the
> >> > intention of the feature. What do you think?
> >>
> >>I wouldn't worry about this 'change in behavior' because if the caller
> >>is JDBC complient it should be coded to handle the new behavior as it is
> >>complient with the spec.
> >>
> >>thanks,
> >>--Barry
> >>
> >>
> >>
> >>
> >>Rene Pijlman wrote:
> >>
> >>>On Mon, 27 Aug 2001 11:07:55 -0700, you wrote:
> >>>[executeBatch() implemented as one round trip]
> >>>
> >>>
> >>>>Here is how I would suggest this be done in a way that is spec
> >>>>compliant (Note: that I haven't looked at the patch you submited yet, so
> >>>>forgive me if you have already done it this way, but based on your
> >>>>comments in this email, my guess is that you have not).
> >>>>
> >>>>
> >>>Indeed, I have not implemented this.
> >>>
> >>>
> >>>
> >>>>Statements should be batched together in a single statement with
> >>>>semicolons separating the individual statements (this will allow the
> >>>>backend to process them all in one round trip).
> >>>>
> >>>>The result array should return an element with the row count for each
> >>>>statement, however the value for all but the last statement will be
> >>>>'-2'.  (-2 is defined by the spec to mean the statement was processed
> >>>>successfully but the number of affected rows is unknown).
> >>>>
> >>>>
> >>>Ah, I see. I hadn't thought of that solution.
> >>>
> >>>
> >>>
> >>>>In the event of an error, then the driver should return an array the
> >>>>size of the submitted batch with values of -3 for all elements. -3 is
> >>>>defined by the spec as the corresponding statement failed to execute
> >>>>successfully, or for statements that could not be processed for some
> >>>>reason.  Since in postgres when one statement fails (in non-autocommit
> >>>>mode), the entire transaction is aborted this is consistent with a
> >>>>return value of -3 in my reading of the spec.
> >>>>
> >>>>
> >>>Not quite. A statement in a batch may also fail because its a
> >>>succesful SELECT as far as the server is concerned (can't have
> >>>select's in a batch). But that situation can also be handled
> >>>correctly by setting the update count for that particular
> >>>statement to -3. Its then up to the application to decide if it
> >>>wants to rollback, I would say.
> >>>
> >>>But what to do when an error occurs with autocommit enabled?
> >>>This is not recommended, but allowed by the spec, if I
> >>>understand it correctly.
> >>>
> >>>What exactly is the behaviour of the backend in that scenario?
> >>>Does it commit every separate SQL statement in the
> >>>semicolon-separated list, or does it commit the list as a whole?
> >>>Does it abort processing the statement list when an error occurs
> >>>in one statement? And if it continues, does it return an error
> >>>when only one statement in the middle of the list had an error?
> >>>
> >>>
> >>>
> >>>>I believe this approach makes the most sense because:
> >>>>1) It implements batches in one round trip (the intention of the feature)
> >>>>2) It is complient with the standard
> >>>>3) It is complient with the current functionality of the backend
> >>>>
> >>>>
> >>>If we can come up with an acceptable solution for an error with
> >>>autocommit enabled, I agree. Otherwise, I'm not sure.
> >>>
> >>>However, it would mean a change in behaviour of the driver that
> >>>may break existing JDBC applications: the driver will no longer
> >>>return update counts for all statements in a batch like it
> >>>currently does, it will return "unknown" for most statements.
> >>>I'm not sure if the performance improvement justifies this
> >>>non-backwardly-compatible change, though I agree this is the
> >>>intention of the feature. What do you think?
> >>>
> >>>Regards,
> >>>Ren? Pijlman
> >>>
> >>>
> >>>
> >>
> >>
> >>---------------------------(end of broadcast)---------------------------
> >>TIP 3: if posting/reading through Usenet, please send an appropriate
> >>subscribe-nomail command to majordomo@postgresql.org so that your
> >>message can get through to the mailing list cleanly
> >>
> >>
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Re: Proposal to fix Statement.executeBatch()

From
Rene Pijlman
Date:
On Mon, 27 Aug 2001 22:57:13 -0700, Barry Lind wrote:
>I do not know what the server does if you have autocommit enabled
>and you issue multiple statements in one try.

As you know, Peter Eisentraut said on hackers that all
statements in a semicolon-separated query string are processed
as one single transaction. So, if in "S1;S2;S3" S2 fails, both
S1, S2 and S3 are rolled back and in effect they have all
failed.

I think this means we can implement your proposal. I've removed
the paragraph saying that we can't and I've added the following
to http://lab.applinet.nl/postgresql-jdbc/#Batch

-+-+-

The current implementation of Statement.executeBatch() in the
JDBC driver does not provide any performance improvement
compared to processing statements individually. This is because
the driver executes the statements one-by-one when
executeBatch() is called, using one round trip per statement. We
intend to reimplement executeBatch() in the following way.

Statement.executeBatch() will send all statements in a single
semicolon separated query string, with only one round trip to
the backend. This will provide a performance improvement, as
intended by the JDBC specification.

The updateCounts array will be set as described below. Note that
the JDBC spec defines the meaning of the following special
values:

-2 the statement was executed successfully but the number of
affected rows is unknown
-3 the statement failed to execute successfully

If all statements succeed, executeBatch() returns an
updateCounts array with a row count for each statement in the
batch, however the value for all but the last statement will be
-2. The value for the last statement will be a proper update
count.
If a statement fails, executeBatch() throws a
BatchUpdateException containing an updateCounts array with a row
count of -3 for each statement in the batch.

Note that the behaviour will be the same when autocommit is
enabled and when it is disabled. Even with autocommit enabled,
the backend will commit or rollback all statements in the
semicolon-separated query string as a unit.

The new implementation of executeBatch() will cause a change in
behaviour of the driver: the driver will no longer return update
counts for all statements in a batch like it currently does, it
will return -2 ("unknown") for most statements. However, this
behaviour is allowed by the JDBC spec and applications should be
prepared to handle it.

-+-+-

I see two more issues we need to decide on...

1) The JDBC spec requires Statement.executeBatch() to throw a
BatchUpdateException if any of the statements does not return an
update count (e.g. is a SELECT). How can we implement this? Do
we need to parse the statements in the JDBC driver to detect
SELECT's? It is a matter of interpretation, but it seems OK to
me to just ignore this and return -2/-3 for SELECT's as well. In
fact, perhaps we should allow SELECT's for function calls!?

2) The reimplementation may cause the driver to send very long
statements to the backend. I heard something about an 8K limit.
In what version of the backend was this limitation removed? I
guess we should implement the new algorithm conditionally, so
we'll only send multi-statement query strings to a backend that
has no statement length limitation.

Regards,
René Pijlman <rene@lab.applinet.nl>

Re: Re: Proposal to fix Statement.executeBatch()

From
Barry Lind
Date:
Rene,

This writeup looks good.  As for your two followup questions:

1)  I agree with your opinion on the select issue.  Since selects are
the only way to call functions and functions could themselves be doing
inserts/updates, to not allow selects is a bad idea IMHO.

2)  The 8K limitation was removed from the database in 7.0 and from the
JDBC driver in 7.0.2.  Therefore I don't think we should code for the
case of a 7.2 JDBC driver needing to support a 6.5 database.

thanks,
--Barry


Rene Pijlman wrote:
> On Mon, 27 Aug 2001 22:57:13 -0700, Barry Lind wrote:
>
>>I do not know what the server does if you have autocommit enabled
>>and you issue multiple statements in one try.
>>
>
> As you know, Peter Eisentraut said on hackers that all
> statements in a semicolon-separated query string are processed
> as one single transaction. So, if in "S1;S2;S3" S2 fails, both
> S1, S2 and S3 are rolled back and in effect they have all
> failed.
>
> I think this means we can implement your proposal. I've removed
> the paragraph saying that we can't and I've added the following
> to http://lab.applinet.nl/postgresql-jdbc/#Batch
>
> -+-+-
>
> The current implementation of Statement.executeBatch() in the
> JDBC driver does not provide any performance improvement
> compared to processing statements individually. This is because
> the driver executes the statements one-by-one when
> executeBatch() is called, using one round trip per statement. We
> intend to reimplement executeBatch() in the following way.
>
> Statement.executeBatch() will send all statements in a single
> semicolon separated query string, with only one round trip to
> the backend. This will provide a performance improvement, as
> intended by the JDBC specification.
>
> The updateCounts array will be set as described below. Note that
> the JDBC spec defines the meaning of the following special
> values:
>
> -2 the statement was executed successfully but the number of
> affected rows is unknown
> -3 the statement failed to execute successfully
>
> If all statements succeed, executeBatch() returns an
> updateCounts array with a row count for each statement in the
> batch, however the value for all but the last statement will be
> -2. The value for the last statement will be a proper update
> count.
> If a statement fails, executeBatch() throws a
> BatchUpdateException containing an updateCounts array with a row
> count of -3 for each statement in the batch.
>
> Note that the behaviour will be the same when autocommit is
> enabled and when it is disabled. Even with autocommit enabled,
> the backend will commit or rollback all statements in the
> semicolon-separated query string as a unit.
>
> The new implementation of executeBatch() will cause a change in
> behaviour of the driver: the driver will no longer return update
> counts for all statements in a batch like it currently does, it
> will return -2 ("unknown") for most statements. However, this
> behaviour is allowed by the JDBC spec and applications should be
> prepared to handle it.
>
> -+-+-
>
> I see two more issues we need to decide on...
>
> 1) The JDBC spec requires Statement.executeBatch() to throw a
> BatchUpdateException if any of the statements does not return an
> update count (e.g. is a SELECT). How can we implement this? Do
> we need to parse the statements in the JDBC driver to detect
> SELECT's? It is a matter of interpretation, but it seems OK to
> me to just ignore this and return -2/-3 for SELECT's as well. In
> fact, perhaps we should allow SELECT's for function calls!?
>
> 2) The reimplementation may cause the driver to send very long
> statements to the backend. I heard something about an 8K limit.
> In what version of the backend was this limitation removed? I
> guess we should implement the new algorithm conditionally, so
> we'll only send multi-statement query strings to a backend that
> has no statement length limitation.
>
> Regards,
> René Pijlman <rene@lab.applinet.nl>
>



Re: Re: Proposal to fix Statement.executeBatch()

From
Bruce Momjian
Date:
Barry, I just applied this patch and a few others.  Let me know if they
are OK.


> Rene,
>
> This writeup looks good.  As for your two followup questions:
>
> 1)  I agree with your opinion on the select issue.  Since selects are
> the only way to call functions and functions could themselves be doing
> inserts/updates, to not allow selects is a bad idea IMHO.
>
> 2)  The 8K limitation was removed from the database in 7.0 and from the
> JDBC driver in 7.0.2.  Therefore I don't think we should code for the
> case of a 7.2 JDBC driver needing to support a 6.5 database.
>
> thanks,
> --Barry
>
>
> Rene Pijlman wrote:
> > On Mon, 27 Aug 2001 22:57:13 -0700, Barry Lind wrote:
> >
> >>I do not know what the server does if you have autocommit enabled
> >>and you issue multiple statements in one try.
> >>
> >
> > As you know, Peter Eisentraut said on hackers that all
> > statements in a semicolon-separated query string are processed
> > as one single transaction. So, if in "S1;S2;S3" S2 fails, both
> > S1, S2 and S3 are rolled back and in effect they have all
> > failed.
> >
> > I think this means we can implement your proposal. I've removed
> > the paragraph saying that we can't and I've added the following
> > to http://lab.applinet.nl/postgresql-jdbc/#Batch
> >
> > -+-+-
> >
> > The current implementation of Statement.executeBatch() in the
> > JDBC driver does not provide any performance improvement
> > compared to processing statements individually. This is because
> > the driver executes the statements one-by-one when
> > executeBatch() is called, using one round trip per statement. We
> > intend to reimplement executeBatch() in the following way.
> >
> > Statement.executeBatch() will send all statements in a single
> > semicolon separated query string, with only one round trip to
> > the backend. This will provide a performance improvement, as
> > intended by the JDBC specification.
> >
> > The updateCounts array will be set as described below. Note that
> > the JDBC spec defines the meaning of the following special
> > values:
> >
> > -2 the statement was executed successfully but the number of
> > affected rows is unknown
> > -3 the statement failed to execute successfully
> >
> > If all statements succeed, executeBatch() returns an
> > updateCounts array with a row count for each statement in the
> > batch, however the value for all but the last statement will be
> > -2. The value for the last statement will be a proper update
> > count.
> > If a statement fails, executeBatch() throws a
> > BatchUpdateException containing an updateCounts array with a row
> > count of -3 for each statement in the batch.
> >
> > Note that the behaviour will be the same when autocommit is
> > enabled and when it is disabled. Even with autocommit enabled,
> > the backend will commit or rollback all statements in the
> > semicolon-separated query string as a unit.
> >
> > The new implementation of executeBatch() will cause a change in
> > behaviour of the driver: the driver will no longer return update
> > counts for all statements in a batch like it currently does, it
> > will return -2 ("unknown") for most statements. However, this
> > behaviour is allowed by the JDBC spec and applications should be
> > prepared to handle it.
> >
> > -+-+-
> >
> > I see two more issues we need to decide on...
> >
> > 1) The JDBC spec requires Statement.executeBatch() to throw a
> > BatchUpdateException if any of the statements does not return an
> > update count (e.g. is a SELECT). How can we implement this? Do
> > we need to parse the statements in the JDBC driver to detect
> > SELECT's? It is a matter of interpretation, but it seems OK to
> > me to just ignore this and return -2/-3 for SELECT's as well. In
> > fact, perhaps we should allow SELECT's for function calls!?
> >
> > 2) The reimplementation may cause the driver to send very long
> > statements to the backend. I heard something about an 8K limit.
> > In what version of the backend was this limitation removed? I
> > guess we should implement the new algorithm conditionally, so
> > we'll only send multi-statement query strings to a backend that
> > has no statement length limitation.
> >
> > Regards,
> > Ren? Pijlman <rene@lab.applinet.nl>
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026