Thread: Transaction atomicity
Hi all, I have a question about a simple code that should automatically insert a record in a table. The primary key for this record is automatically generated during the INSERT statement, as the max value plus one. The (simplified, since the real number of fields is 5) code is: final String query = "INTO table (docId,seqNr) " + "VALUES (?, (SELECT 1 + coalesce (max(seqNr), 0) "+ "FROM table " + "WHERE seqNr BETWEEN 0 AND (9223372036854775807-1) ))"; Connection conn=daof.getConnection(); try { conn.setAutoCommit(true); PrepararedStatemen st = conn.prepareStatement(query); st.setString(1,'myDocId'); st.executeUpdate(); conn.close(); } catch (SQLException e) { try { conn.close(); } catch (SQLException ee) {} throw e; } I then create Runtime.getRuntime().availableProcessors() threads that create a DataSource and start executing that code in parallel. What happens is that I get many DUPLICATE KEY errors. I thought that in AutoCommit all insert are supposed to be serialized, so they should create a seqNr different since MAX(seqNr) is different at each call. Is my reasoning correct? Or is it a different problem, maybe with jdbc or postgresql? Thanks, Giuseppe
Giuseppe Couple of things. 1) What you are observing is quite normal for postgresql. It use MVCC so it's quite possible for every connection to see the same value for max (seNR). which leads us to 2 2) Don't use max() for this. If you have to use max then you have to lock the record before doing the insert which will slow everyone down. Postgresql provides you with sequences for exactly this purpose. Use nextval('sequence_name') to increment it and currval ('sequence_name') to get the value that your connection just used. I strongly suggest you read and understand the section on MVCC in the postgresql manual and sequences. Dave On 7-Mar-07, at 6:59 AM, Giuseppe Sacco wrote: > Hi all, > I have a question about a simple code that should automatically > insert a > record in a table. The primary key for this record is automatically > generated during the INSERT statement, as the max value plus one. > > The (simplified, since the real number of fields is 5) code is: > > final String query = "INTO table (docId,seqNr) " + > "VALUES (?, (SELECT 1 + coalesce (max(seqNr), 0) "+ > "FROM table " + > "WHERE seqNr BETWEEN 0 AND (9223372036854775807-1) ))"; > Connection conn=daof.getConnection(); > try { > conn.setAutoCommit(true); > PrepararedStatemen st = conn.prepareStatement(query); > st.setString(1,'myDocId'); > st.executeUpdate(); > conn.close(); > } > catch (SQLException e) { > try { conn.close(); } catch (SQLException ee) {} > throw e; > } > > I then create Runtime.getRuntime().availableProcessors() threads that > create a DataSource and start executing that code in parallel. What > happens is that I get many DUPLICATE KEY errors. I thought that in > AutoCommit all insert are supposed to be serialized, so they should > create a seqNr different since MAX(seqNr) is different at each call. > > Is my reasoning correct? Or is it a different problem, maybe with jdbc > or postgresql? > > Thanks, > Giuseppe > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
Hi Dave, Il giorno mer, 07/03/2007 alle 07.15 -0500, Dave Cramer ha scritto: > 1) What you are observing is quite normal for postgresql. It use MVCC > so it's quite possible for every connection to see the same value for > max (seNR). > which leads us to 2 Thank you very much. I just read the documentation about MVCC and I understand why my code didn't work. > 2) Don't use max() for this. If you have to use max then you have to > lock the record before doing the insert which will slow everyone > down. Postgresql provides you with sequences for exactly this > purpose. Use nextval('sequence_name') to increment it and currval > ('sequence_name') to get the value that your connection just used. Thanks for this suggestion. I have been evaluating the use of sequences in my application, but I was looking for a more portable code (this code have to run on postgresq 8.1, mysql 5, oracle 9.2 and sql-server 2005). As you may know postgresql nextval syntax isn't as required by the standard; and other vendors use different syntax too. Moreover it would make my code really complex since I would need a new sequence for any connected user. So, before starting using sequences, I will try to loop my original INSERT statement in order to see how long is the average looping. Probably if this is less than 4 iterations, then I will go this way. Again, thank you for your help, Giuseppe
> I have a question about a simple code that should automatically insert a > record in a table. The primary key for this record is automatically > generated during the INSERT statement, as the max value plus one. > > The (simplified, since the real number of fields is 5) code is: > > final String query = "INTO table (docId,seqNr) " + > "VALUES (?, (SELECT 1 + coalesce (max(seqNr), 0) "+ > "FROM table " + > "WHERE seqNr BETWEEN 0 AND (9223372036854775807-1) ))"; > Connection conn=daof.getConnection(); > try { > conn.setAutoCommit(true); > PrepararedStatemen st = conn.prepareStatement(query); > st.setString(1,'myDocId'); > st.executeUpdate(); > conn.close(); > } > catch (SQLException e) { > try { conn.close(); } catch (SQLException ee) {} > throw e; > } > > I then create Runtime.getRuntime().availableProcessors() threads that > create a DataSource and start executing that code in parallel. What > happens is that I get many DUPLICATE KEY errors. I thought that in > AutoCommit all insert are supposed to be serialized, so they should > create a seqNr different since MAX(seqNr) is different at each call. > > Is my reasoning correct? Or is it a different problem, maybe with jdbc > or postgresql? Does the behaviour change if you set the prepare threshold of the statement to 0? Yours, Laurenz Albe
Giuseppe Sacco wrote: > Thanks for this suggestion. I have been evaluating the use of sequences > in my application, but I was looking for a more portable code (this code > have to run on postgresq 8.1, mysql 5, oracle 9.2 and sql-server 2005). > As you may know postgresql nextval syntax isn't as required by the > standard; and other vendors use different syntax too. Moreover it would > make my code really complex since I would need a new sequence for any > connected user. A more portable way to use sequences is to define a default expression on the column when you create the table, like this: CREATE TABLE foo ( key int DEFAULT (nextval('foo_seq')) PRIMARY KEY data ... ) Now if you don't give key a value in your insert statement, it'll be assigned the next value from the sequence automatically. For example: INSERT INTO foo (data, ...) VALUES ('bar', ...); The create table syntax isn't portable, AFAIK, but there's a similar construct available on all common DBMSs. You could use the same SQL in your application to insert rows, only the DDL to create tables would be different on different DBMSs. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi Laurenz, Il giorno mer, 07/03/2007 alle 16.11 +0100, Albe Laurenz ha scritto: [...] > Does the behaviour change if you set the prepare threshold of > the statement to 0? Actually, I get a new pooled connection every time I need to start this statement, so I create a new statement that I use just once. I think in this case I cannot use the server side statement. In any case, based on what I understood from MVCC documentation, a server side prepared statement wound not change the behaviour. Thanks, Giuseppe
Hi Heikki, Il giorno mer, 07/03/2007 alle 15.50 +0000, Heikki Linnakangas ha scritto: > Giuseppe Sacco wrote: > > Thanks for this suggestion. I have been evaluating the use of sequences > > in my application, but I was looking for a more portable code (this code > > have to run on postgresq 8.1, mysql 5, oracle 9.2 and sql-server 2005). > > As you may know postgresql nextval syntax isn't as required by the > > standard; and other vendors use different syntax too. Moreover it would > > make my code really complex since I would need a new sequence for any > > connected user. > > A more portable way to use sequences is to define a default expression > on the column when you create the table, like this: > > CREATE TABLE foo ( > key int DEFAULT (nextval('foo_seq')) PRIMARY KEY > data ... > ) [...] You are right, but I need different sequences for every user, i.e., if two users insert on the same table then I need a way to use different sequence. The reason of this is that I have to split my application into a few different postgresql instances based in different offices. Every night all instances synchronise their data (this is and INSERT only table), so I need a different table sequence in every office (or group of users or single user). Using sequences I would probably need to create all these sequences, then use the right sequence to get the nextval and then call my INSERT statement with the right value. This imply the explicitly use of a sequence for every insert. This is not a big problem, I just need to have a special case for every dbms. Thanks again, Giuseppe
On 3/7/07 9:06 AM, "Giuseppe Sacco" <giuseppe@eppesuigoccas.homedns.org>
wrote:
> You are right, but I need different sequences for every user, i.e., if
> two users insert on the same table then I need a way to use different
> sequence. The reason of this is that I have to split my application into
> a few different postgresql instances based in different offices. Every
> night all instances synchronise their data (this is and INSERT only
> table), so I need a different table sequence in every office (or group
> of users or single user).
Using select(max(id)) won't work in this case, either (if I'm understanding
your setup correctly).
So each office has its own postgresql instance. Then every night you copy
the data from all offices to all other offices. Therefore, that whole day,
inserts were made in each office instance with the same IDs, which will
cause duplicate key exceptions.
Why not have a compound key on this table, with an ID generated by a
sequence (one sequence, named the same, for each instance of PostgreSQL for
each office), and an Office ID that is static for each instance? Then the
merge/sync would go through without a hitch.
On 3/7/07 9:06 AM, "Giuseppe Sacco" <giuseppe@eppesuigoccas.homedns.org> wrote: > You are right, but I need different sequences for every user, i.e., if > two users insert on the same table then I need a way to use different > sequence. The reason of this is that I have to split my application into > a few different postgresql instances based in different offices. Every > night all instances synchronise their data (this is and INSERT only > table), so I need a different table sequence in every office (or group > of users or single user). Using select(max(id)) won't work in this case, either (if I'm understanding your setup correctly). So each office has its own postgresql instance. Then every night you copy the data from all offices to all other offices. Therefore, that whole day, inserts were made in each office instance with the same IDs, which will cause duplicate key exceptions. Does each user in each office have their own instance? Why not have a compound key on this table, with an ID generated by a sequence (one sequence, named the same, for each instance of PostgreSQL for each office), and an Office ID that is static for each instance? Then the merge/sync would go through without a hitch. -- Jeff Hubbach
Il giorno mer, 07/03/2007 alle 08.12 -0800, Jeff Hubbach ha scritto: > On 3/7/07 9:06 AM, "Giuseppe Sacco" > <giuseppe@eppesuigoccas.homedns.org> > wrote: > > > You are right, but I need different sequences for every user, i.e., > if > > two users insert on the same table then I need a way to use > different > > sequence. The reason of this is that I have to split my application > into > > a few different postgresql instances based in different offices. > Every > > night all instances synchronise their data (this is and INSERT only > > table), so I need a different table sequence in every office (or > group > > of users or single user). > > Using select(max(id)) won't work in this case, either (if I'm > understanding > your setup correctly). > It works since I assigned ranges to each office. The query I wrote in my original post was: final String query = "INTO table (docId,seqNr) " + "VALUES (?, (SELECT 1 + coalesce (max(seqNr), 0) "+ "FROM table " + "WHERE seqNr BETWEEN 0 AND (9223372036854775807-1) ))"; as you may see, I look for a MAX in a specific range. Every office has a different range. > > Why not have a compound key on this table, with an ID generated by a > sequence (one sequence, named the same, for each instance of > PostgreSQL for > each office), and an Office ID that is static for each instance? Then > the > merge/sync would go through without a hitch. You are right, this is a second option, but we cannot adopt it since we have a lot of table that use foreign keys against this one. Adding one field would require a change in every table in order to complete the foreign key constraint. Thanks for you hint, Giuseppe
Jeff Hubbach wrote: > Why not have a compound key on this table, with an ID generated by a > sequence (one sequence, named the same, for each instance of PostgreSQL for > each office), and an Office ID that is static for each instance? Then the > merge/sync would go through without a hitch. That's what I was thinking. If you don't want to have a two-field key, for example because you can't change the schema you already have, you could still divide a range of ids for each office when you create the sequence: CREATE SEQUENCE fooseq MINVALUE 10000000 MAXVALUE 19999999 NO CYCLE Just use a different range for each office. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi Heikki, Il giorno mer, 07/03/2007 alle 16.36 +0000, Heikki Linnakangas ha scritto: [...] > If you don't want to have a two-field key, for example because you can't > change the schema you already have, you could still divide a range of > ids for each office when you create the sequence: > > CREATE SEQUENCE fooseq MINVALUE 10000000 MAXVALUE 19999999 NO CYCLE This is the same solution I was thinking of, and probably the best one. I checked that currently the application only loops an average of 4 times, so I don't have a real urgency of switching to sequences. I will implement this code in two weeks from now, since I should have enough time. Thanks again, Giuseppe P.S. Just a note, I will not specify "NO CYCLE" since it seems to be the default behaviour and since it seems oracle uses "NOCYCLE" in one word.
Il giorno mer, 07/03/2007 alle 18.25 +0100, Giuseppe Sacco ha scritto: > Hi Heikki, [...] > > CREATE SEQUENCE fooseq MINVALUE 10000000 MAXVALUE 19999999 NO CYCLE > > This is the same solution I was thinking of, and probably the best one. > I checked that currently the application only loops an average of 4 > times, so I don't have a real urgency of switching to sequences. I will > implement this code in two weeks from now, since I should have enough > time. [a bit off topic] Just for the records: sql-server doesn't accept the max() solution, so I will have to implement the sequence way very quickly. The error I get is: errorCode=1046, sqlState= S1000 "Subqueries are not allowed in this context. Only scalar expressions are allowed" :-) Again, thanks to all, Giuseppe