Thread: JDBC driver - is "getGeneratedKeys()" guaranteed to return the ids in the same order a batch insert was made?

Hi,

Using JDBC, I batch insert multiple rows ("executeBatch()"). I then use 'getGeneratedKeys("id")' to get the generated ids ("id" is a "SERIAL PRIMARY KEY" column).

My question: does the PostgreSQL JDBC driver guarantees that the order of the returned generated ids will be the same as the rows to insert have been specified, using "addBatch()"?


The best "answer" to that question I have found is https://stackoverflow.com/a/16119489/843699 , but it is not 100% clear.

Would it be possible to have an official answer on this?

Thanks in advance!




On Sun, 6 Dec 2020 at 15:52, electrotype <electrotype@gmail.com> wrote:

Hi,

Using JDBC, I batch insert multiple rows ("executeBatch()"). I then use 'getGeneratedKeys("id")' to get the generated ids ("id" is a "SERIAL PRIMARY KEY" column).

My question: does the PostgreSQL JDBC driver guarantees that the order of the returned generated ids will be the same as the rows to insert have been specified, using "addBatch()"?


The best "answer" to that question I have found is https://stackoverflow.com/a/16119489/843699 , but it is not 100% clear.

Would it be possible to have an official answer on this?


I can't see how they could possibly be out of order. 


Dave Cramer
www.postgres.rocks 


Thanks in advance!


I can't see how they could possibly be out of order. 

Thanks, that what I think too. But, to be honest, I'd really like to see this written in some documentation! In some cases, this small detail can be quite important.

So I'm curious. Why does order matter ?

Dave Cramer
www.postgres.rocks


On Wed, 9 Dec 2020 at 03:15, electrotype <electrotype@gmail.com> wrote:
I can't see how they could possibly be out of order. 

Thanks, that what I think too. But, to be honest, I'd really like to see this written in some documentation! In some cases, this small detail can be quite important.

So I'm curious. Why does order matter ?

Dave Cramer


When you have to save multiple new entities with subentities.

You first save all the parent entities in a single SQL batch insert, you get the generated ids, then insert all the subentities in another single SQL batch insert. To know which "parent id" to use for a given subentity of the second query, you need a way to associate a generated id with the correct parent entity. The order of the parents in their batch, and the order of the generated ids, is the only straighforward way.

I know all this could be made into a single SQL query, without having to associate the generated ids to the parents manually. But sometimes you have to fight really hard agains your framework or JDBC itself to write such more complex query, where two batch inserts are very natural.



On Wed, 9 Dec 2020 at 10:21, electrotype <electrotype@gmail.com> wrote:
So I'm curious. Why does order matter ?

Dave Cramer


When you have to save multiple new entities with subentities.

You first save all the parent entities in a single SQL batch insert, you get the generated ids, then insert all the subentities in another single SQL batch insert. To know which "parent id" to use for a given subentity of the second query, you need a way to associate a generated id with the correct parent entity. The order of the parents in their batch, and the order of the generated ids, is the only straighforward way.

I know all this could be made into a single SQL query, without having to associate the generated ids to the parents manually. But sometimes you have to fight really hard agains your framework or JDBC itself to write such more complex query, where two batch inserts are very natural.



Fair enough, however the spec does not say anything about the order. In fact it doesn't even say which keys will be returned. 

I don't think we can make any guarantees here.

Dave
On Wed, Dec 9, 2020 at 8:20 AM electrotype <electrotype@gmail.com> wrote:
So I'm curious. Why does order matter ?

Dave Cramer


When you have to save multiple new entities with subentities.

You first save all the parent entities in a single SQL batch insert, you get the generated ids, then insert all the subentities in another single SQL batch insert. To know which "parent id" to use for a given subentity of the second query, you need a way to associate a generated id with the correct parent entity. The order of the parents in their batch, and the order of the generated ids, is the only straighforward way.

I know all this could be made into a single SQL query, without having to associate the generated ids to the parents manually. But sometimes you have to fight really hard agains your framework or JDBC itself to write such more complex query, where two batch inserts are very natural.

Agreed.

However, this isn't really the purview of JDBC - I'm doubting it does anything that would cause the order to be different than what is received, and the batch items are sent and results processed sequentially.

The main question is whether any batch items are inserting multiple records themselves - i.e., RETURNING * is producing multiple results.  Whatever order RETURNING * produces is what the driver will capture - but it isn't responsible for guaranteeing that the order of multiple inserted records going in matches what comes out.  PostgreSQL needs to make that claim.  I don't see where it does (i've sent an email to see if adding such a claim to the documentation is proper).  Done manually one can always do "WITH insert returning SELECT ORDER BY", but it doesn't seem workable for the driver to try and do that when adding the returning clause, which I presume is what is in scope here.

David J.

Agreed.

However, this isn't really the purview of JDBC - I'm doubting it does anything that would cause the order to be different than what is received, and the batch items are sent and results processed sequentially.

The main question is whether any batch items are inserting multiple records themselves - i.e., RETURNING * is producing multiple results.  Whatever order RETURNING * produces is what the driver will capture - but it isn't responsible for guaranteeing that the order of multiple inserted records going in matches what comes out.  PostgreSQL needs to make that claim.  I don't see where it does (i've sent an email to see if adding such a claim to the documentation is proper).  Done manually one can always do "WITH insert returning SELECT ORDER BY", but it doesn't seem workable for the driver to try and do that when adding the returning clause, which I presume is what is in scope here.

David J.

Thank you, it's appreciated! I'm sure this clarification would help other developers too.
On Wed, Dec 9, 2020 at 1:31 PM electrotype <electrotype@gmail.com> wrote:
Agreed.

However, this isn't really the purview of JDBC - I'm doubting it does anything that would cause the order to be different than what is received, and the batch items are sent and results processed sequentially.

The main question is whether any batch items are inserting multiple records themselves - i.e., RETURNING * is producing multiple results.  Whatever order RETURNING * produces is what the driver will capture - but it isn't responsible for guaranteeing that the order of multiple inserted records going in matches what comes out.  PostgreSQL needs to make that claim.  I don't see where it does (i've sent an email to see if adding such a claim to the documentation is proper).  Done manually one can always do "WITH insert returning SELECT ORDER BY", but it doesn't seem workable for the driver to try and do that when adding the returning clause, which I presume is what is in scope here.

David J.

Thank you, it's appreciated! I'm sure this clarification would help other developers too.

My take is that there is presently no guarantee, and that with current efforts to add parallelism it is quite probable that observation of such non-orderedness is simply a matter of time.  With batching it seems best to combine its use with single inserts in order to avoid this problem.

David J.



On Thu, 10 Dec 2020 at 19:37, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wed, Dec 9, 2020 at 1:31 PM electrotype <electrotype@gmail.com> wrote:
Agreed.

However, this isn't really the purview of JDBC - I'm doubting it does anything that would cause the order to be different than what is received, and the batch items are sent and results processed sequentially.

The main question is whether any batch items are inserting multiple records themselves - i.e., RETURNING * is producing multiple results.  Whatever order RETURNING * produces is what the driver will capture - but it isn't responsible for guaranteeing that the order of multiple inserted records going in matches what comes out.  PostgreSQL needs to make that claim.  I don't see where it does (i've sent an email to see if adding such a claim to the documentation is proper).  Done manually one can always do "WITH insert returning SELECT ORDER BY", but it doesn't seem workable for the driver to try and do that when adding the returning clause, which I presume is what is in scope here.

David J.

Thank you, it's appreciated! I'm sure this clarification would help other developers too.

My take is that there is presently no guarantee, and that with current efforts to add parallelism it is quite probable that observation of such non-orderedness is simply a matter of time.  With batching it seems best to combine its use with single inserts in order to avoid this problem.

David J.

I'd have to agree.

Dave Cramer
www.postgres.rocks 
My take is that there is presently no guarantee, and that with current efforts to add parallelism it is quite probable that observation of such non-orderedness is simply a matter of time.  With batching it seems best to combine its use with single inserts in order to avoid this problem.

David J.

I wish that was not the conclusion, but at least it is clear!

Thanks for the help to both of you.