Thread: Transaction atomicity

Transaction atomicity

From

Giuseppe Sacco

Date:

07 March 2007, 07:55:30

Hi all,
I have a question about a simple code that should automatically insert a
record in a table. The primary key for this record is automatically
generated during the INSERT statement, as the max value plus one.

The (simplified, since the real number of fields is 5) code is:

final String query = "INTO table (docId,seqNr) " +
        "VALUES (?, (SELECT 1 + coalesce (max(seqNr), 0) "+
        "FROM table " +
        "WHERE seqNr BETWEEN 0 AND (9223372036854775807-1) ))";
Connection conn=daof.getConnection();
try {
    conn.setAutoCommit(true);
    PrepararedStatemen st = conn.prepareStatement(query);
    st.setString(1,'myDocId');
    st.executeUpdate();
    conn.close();
}
catch (SQLException e) {
    try { conn.close(); } catch (SQLException ee) {}
    throw e;
}

I then create Runtime.getRuntime().availableProcessors() threads that
create a DataSource and start executing that code in parallel. What
happens is that I get many DUPLICATE KEY errors. I thought that in
AutoCommit all insert are supposed to be serialized, so they should
create a seqNr different since MAX(seqNr) is different at each call.

Is my reasoning correct? Or is it a different problem, maybe with jdbc
or postgresql?

Thanks,
Giuseppe

Re: Transaction atomicity

From

Dave Cramer

Date:

07 March 2007, 08:15:14

Giuseppe

Couple of things.

1) What you are observing is quite normal for postgresql. It use MVCC
so it's quite possible for every connection to see the same value for
max (seNR).
which leads us to 2

2) Don't use max() for this. If you have to use max then you have to
lock the record before doing the insert which will slow everyone
down. Postgresql provides you with sequences for exactly this
purpose. Use nextval('sequence_name') to increment it and currval
('sequence_name') to get the value that your connection just used.

I strongly suggest you read and understand the section on MVCC in the
postgresql manual and sequences.

Dave
On 7-Mar-07, at 6:59 AM, Giuseppe Sacco wrote:

> Hi all,
> I have a question about a simple code that should automatically
> insert a
> record in a table. The primary key for this record is automatically
> generated during the INSERT statement, as the max value plus one.
>
> The (simplified, since the real number of fields is 5) code is:
>
> final String query = "INTO table (docId,seqNr) " +
>         "VALUES (?, (SELECT 1 + coalesce (max(seqNr), 0) "+
>         "FROM table " +
>         "WHERE seqNr BETWEEN 0 AND (9223372036854775807-1) ))";
> Connection conn=daof.getConnection();
> try {
>     conn.setAutoCommit(true);
>     PrepararedStatemen st = conn.prepareStatement(query);
>     st.setString(1,'myDocId');
>     st.executeUpdate();
>     conn.close();
> }
> catch (SQLException e) {
>     try { conn.close(); } catch (SQLException ee) {}
>     throw e;
> }
>
> I then create Runtime.getRuntime().availableProcessors() threads that
> create a DataSource and start executing that code in parallel. What
> happens is that I get many DUPLICATE KEY errors. I thought that in
> AutoCommit all insert are supposed to be serialized, so they should
> create a seqNr different since MAX(seqNr) is different at each call.
>
> Is my reasoning correct? Or is it a different problem, maybe with jdbc
> or postgresql?
>
> Thanks,
> Giuseppe
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

Re: Transaction atomicity

From

Giuseppe Sacco

Date:

07 March 2007, 11:05:23

Hi Dave,

Il giorno mer, 07/03/2007 alle 07.15 -0500, Dave Cramer ha scritto:
> 1) What you are observing is quite normal for postgresql. It use MVCC
> so it's quite possible for every connection to see the same value for
> max (seNR).
> which leads us to 2

Thank you very much. I just read the documentation about MVCC and I
understand why my code didn't work.

> 2) Don't use max() for this. If you have to use max then you have to
> lock the record before doing the insert which will slow everyone
> down. Postgresql provides you with sequences for exactly this
> purpose. Use nextval('sequence_name') to increment it and currval
> ('sequence_name') to get the value that your connection just used.

Thanks for this suggestion. I have been evaluating the use of sequences
in my application, but I was looking for a more portable code (this code
have to run on postgresq 8.1, mysql 5, oracle 9.2 and sql-server 2005).
As you may know postgresql nextval syntax isn't as required by the
standard; and other vendors use different syntax too. Moreover it would
make my code really complex since I would need a new sequence for any
connected user.

So, before starting using sequences, I will try to loop my original
INSERT statement in order to see how long is the average looping.
Probably if this is less than 4 iterations, then I will go this way.

Again, thank you for your help,
Giuseppe

Re: Transaction atomicity

From

"Albe Laurenz"

Date:

07 March 2007, 11:13:05

> I have a question about a simple code that should automatically insert
a
> record in a table. The primary key for this record is automatically
> generated during the INSERT statement, as the max value plus one.
>
> The (simplified, since the real number of fields is 5) code is:
>
> final String query = "INTO table (docId,seqNr) " +
>         "VALUES (?, (SELECT 1 + coalesce (max(seqNr), 0) "+
>         "FROM table " +
>         "WHERE seqNr BETWEEN 0 AND (9223372036854775807-1) ))";
> Connection conn=daof.getConnection();
> try {
>     conn.setAutoCommit(true);
>     PrepararedStatemen st = conn.prepareStatement(query);
>     st.setString(1,'myDocId');
>     st.executeUpdate();
>     conn.close();
> }
> catch (SQLException e) {
>     try { conn.close(); } catch (SQLException ee) {}
>     throw e;
> }
>
> I then create Runtime.getRuntime().availableProcessors() threads that
> create a DataSource and start executing that code in parallel. What
> happens is that I get many DUPLICATE KEY errors. I thought that in
> AutoCommit all insert are supposed to be serialized, so they should
> create a seqNr different since MAX(seqNr) is different at each call.
>
> Is my reasoning correct? Or is it a different problem, maybe with jdbc
> or postgresql?

Does the behaviour change if you set the prepare threshold of
the statement to 0?

Yours,
Laurenz Albe

Re: Transaction atomicity

From

Heikki Linnakangas

Date:

07 March 2007, 11:50:32

Giuseppe Sacco wrote:
> Thanks for this suggestion. I have been evaluating the use of sequences
> in my application, but I was looking for a more portable code (this code
> have to run on postgresq 8.1, mysql 5, oracle 9.2 and sql-server 2005).
> As you may know postgresql nextval syntax isn't as required by the
> standard; and other vendors use different syntax too. Moreover it would
> make my code really complex since I would need a new sequence for any
> connected user.

A more portable way to use sequences is to define a default expression
on the column when you create the table, like this:

CREATE TABLE foo (
   key int DEFAULT (nextval('foo_seq')) PRIMARY KEY
   data ...
)

Now if you don't give key a value in your insert statement, it'll be
assigned the next value from the sequence automatically. For example:

INSERT INTO foo (data, ...) VALUES ('bar', ...);

The create table syntax isn't portable, AFAIK, but there's a similar
construct available on all common DBMSs. You could use the same SQL in
your application to insert rows, only the DDL to create tables would be
different on different DBMSs.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Transaction atomicity

From

Giuseppe Sacco

Date:

07 March 2007, 11:52:22

Hi Laurenz,

Il giorno mer, 07/03/2007 alle 16.11 +0100, Albe Laurenz ha scritto:
[...]
> Does the behaviour change if you set the prepare threshold of
> the statement to 0?

Actually, I get a new pooled connection every time I need to start this
statement, so I create a new statement that I use just once. I think in
this case I cannot use the server side statement. In any case, based on
what I understood from MVCC documentation, a server side prepared
statement wound not change the behaviour.

Thanks,
Giuseppe

Re: Transaction atomicity

From

Giuseppe Sacco

Date:

07 March 2007, 12:02:22

Hi Heikki,

Il giorno mer, 07/03/2007 alle 15.50 +0000, Heikki Linnakangas ha
scritto:
> Giuseppe Sacco wrote:
> > Thanks for this suggestion. I have been evaluating the use of sequences
> > in my application, but I was looking for a more portable code (this code
> > have to run on postgresq 8.1, mysql 5, oracle 9.2 and sql-server 2005).
> > As you may know postgresql nextval syntax isn't as required by the
> > standard; and other vendors use different syntax too. Moreover it would
> > make my code really complex since I would need a new sequence for any
> > connected user.
>
> A more portable way to use sequences is to define a default expression
> on the column when you create the table, like this:
>
> CREATE TABLE foo (
>    key int DEFAULT (nextval('foo_seq')) PRIMARY KEY
>    data ...
> )
[...]

You are right, but I need different sequences for every user, i.e., if
two users insert on the same table then I need a way to use different
sequence. The reason of this is that I have to split my application into
a few different postgresql instances based in different offices. Every
night all instances synchronise their data (this is and INSERT only
table), so I need a different table sequence in every office (or group
of users or single user).

Using sequences I would probably need to create all these sequences,
then use the right sequence to get the nextval and then call my INSERT
statement with the right value. This imply the explicitly use of a
sequence for every insert. This is not a big problem, I just need to
have a special case for every dbms.

Thanks again,
Giuseppe

Re: Transaction atomicity

From

"Jeff Hubbach"

Date:

07 March 2007, 12:25:07

On 3/7/07 9:06 AM, "Giuseppe Sacco" <giuseppe@eppesuigoccas.homedns.org>
wrote:

> You are right, but I need different sequences for every user, i.e., if
> two users insert on the same table then I need a way to use different
> sequence. The reason of this is that I have to split my application into
> a few different postgresql instances based in different offices. Every
> night all instances synchronise their data (this is and INSERT only
> table), so I need a different table sequence in every office (or group
> of users or single user).

Using select(max(id)) won't work in this case, either (if I'm understanding
your setup correctly).

So each office has its own postgresql instance. Then every night you copy
the data from all offices to all other offices. Therefore, that whole day,
inserts were made in each office instance with the same IDs, which will
cause duplicate key exceptions.

Why not have a compound key on this table, with an ID generated by a
sequence (one sequence, named the same, for each instance of PostgreSQL for
each office), and an Office ID that is static for each instance? Then the
merge/sync would go through without a hitch.

Re: Transaction atomicity

From

Jeff Hubbach

Date:

07 March 2007, 12:26:07

On 3/7/07 9:06 AM, "Giuseppe Sacco" <giuseppe@eppesuigoccas.homedns.org>
wrote:
> You are right, but I need different sequences for every user, i.e., if
> two users insert on the same table then I need a way to use different
> sequence. The reason of this is that I have to split my application into
> a few different postgresql instances based in different offices. Every
> night all instances synchronise their data (this is and INSERT only
> table), so I need a different table sequence in every office (or group
> of users or single user).

Using select(max(id)) won't work in this case, either (if I'm understanding
your setup correctly).

So each office has its own postgresql instance. Then every night you copy
the data from all offices to all other offices. Therefore, that whole day,
inserts were made in each office instance with the same IDs, which will
cause duplicate key exceptions. Does each user in each office have their own
instance?

Why not have a compound key on this table, with an ID generated by a
sequence (one sequence, named the same, for each instance of PostgreSQL for
each office), and an Office ID that is static for each instance? Then the
merge/sync would go through without a hitch.

-- Jeff Hubbach

Re: Transaction atomicity

From

Giuseppe Sacco

Date:

07 March 2007, 12:35:46

Il giorno mer, 07/03/2007 alle 08.12 -0800, Jeff Hubbach ha scritto:
> On 3/7/07 9:06 AM, "Giuseppe Sacco"
> <giuseppe@eppesuigoccas.homedns.org>
> wrote:
>
> > You are right, but I need different sequences for every user, i.e.,
> if
> > two users insert on the same table then I need a way to use
> different
> > sequence. The reason of this is that I have to split my application
> into
> > a few different postgresql instances based in different offices.
> Every
> > night all instances synchronise their data (this is and INSERT only
> > table), so I need a different table sequence in every office (or
> group
> > of users or single user).
>
> Using select(max(id)) won't work in this case, either (if I'm
> understanding
> your setup correctly).
>
It works since I assigned ranges to each office. The query I wrote in my
original post was:

final String query = "INTO table (docId,seqNr) " +
                "VALUES (?, (SELECT 1 + coalesce (max(seqNr), 0) "+
                "FROM table " +
                "WHERE seqNr BETWEEN 0 AND (9223372036854775807-1) ))";

as you may see, I look for a MAX in a specific range. Every office has a
different range.
>
> Why not have a compound key on this table, with an ID generated by a
> sequence (one sequence, named the same, for each instance of
> PostgreSQL for
> each office), and an Office ID that is static for each instance? Then
> the
> merge/sync would go through without a hitch.

You are right, this is a second option, but we cannot adopt it since we
have a lot of table that use foreign keys against this one. Adding one
field would require a change in every table in order to complete the
foreign key constraint.

Thanks for you hint,
Giuseppe

Re: Transaction atomicity

From

Heikki Linnakangas

Date:

07 March 2007, 12:37:03

Jeff Hubbach wrote:
> Why not have a compound key on this table, with an ID generated by a
> sequence (one sequence, named the same, for each instance of PostgreSQL for
> each office), and an Office ID that is static for each instance? Then the
> merge/sync would go through without a hitch.

That's what I was thinking.

If you don't want to have a two-field key, for example because you can't
change the schema you already have, you could still divide a range of
ids for each office when you create the sequence:

CREATE SEQUENCE fooseq MINVALUE 10000000 MAXVALUE 19999999 NO CYCLE

Just use a different range for each office.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Transaction atomicity

From

Giuseppe Sacco

Date:

07 March 2007, 13:21:28

Hi Heikki,

Il giorno mer, 07/03/2007 alle 16.36 +0000, Heikki Linnakangas ha
scritto:
[...]
> If you don't want to have a two-field key, for example because you can't
> change the schema you already have, you could still divide a range of
> ids for each office when you create the sequence:
>
> CREATE SEQUENCE fooseq MINVALUE 10000000 MAXVALUE 19999999 NO CYCLE

This is the same solution I was thinking of, and probably the best one.
I checked that currently the application only loops an average of 4
times, so I don't have a real urgency of switching to sequences. I will
implement this code in two weeks from now, since I should have enough
time.

Thanks again,
Giuseppe

P.S. Just a note, I will not specify "NO CYCLE" since it seems to be the
default behaviour and since it seems oracle uses "NOCYCLE" in one word.

[OT] Re: Transaction atomicity

From

Giuseppe Sacco

Date:

09 March 2007, 06:03:47

Il giorno mer, 07/03/2007 alle 18.25 +0100, Giuseppe Sacco ha scritto:
> Hi Heikki,
[...]
> > CREATE SEQUENCE fooseq MINVALUE 10000000 MAXVALUE 19999999 NO CYCLE
>
> This is the same solution I was thinking of, and probably the best one.
> I checked that currently the application only loops an average of 4
> times, so I don't have a real urgency of switching to sequences. I will
> implement this code in two weeks from now, since I should have enough
> time.

[a bit off topic]
Just for the records: sql-server doesn't accept the max() solution, so I
will have to implement the sequence way very quickly. The error I get
is: errorCode=1046, sqlState= S1000 "Subqueries are not allowed in this
context. Only scalar expressions are allowed" :-)

Again, thanks to all,
Giuseppe