Thread: Re: atomic commit;begin for long running transactions , in combination with savepoint.

Re: atomic commit;begin for long running transactions , in combination with savepoint.

From

Syan Tan

Date:

15 October 2007, 07:59:54

On Sun Oct 14  7:41 , "Trevor Talbot"  sent:

>On 10/14/07, Syan Tan  wrote:
>> I meant commit and continue current transaction. The transaction is opened
>> on the user application caching composite data from many tables regarding
>> a root object. Because almost all applications cache data, there is apparently
>> a pattern "optimistic offline lock" where orb middleware basically adds
>> a version field to rows , because transactions are usually begun just
>> when the user has done a modification to a displayed value, and now
>> wants to change an old cached value which he believes is the current value.
>
>Well, transactional semantics won't help you here.
>
>In order to detect a change occurred, what you want is a SERIALIZABLE
>transaction: you want the update to fail if the row it matches is no
>longer the same version as your snapshot.  However, in order to read
>the new value to decide if you want to update it anyway, you need to
>leave your current snapshot.  As soon as you do that, ALL previously
>read values lose the update checks that snapshot provided you.

you read the old value at the beginning of the transaction, and
you don't re-read it , assuming it is infrequently changed, so
if someone updates it concurrently, when you try to write, then
you detect the conflict, and rollback to the savepoint.

You DONT want a serializable transaction, because then you can't read
the other committed value after rolling back to the savepoint.

>If you use a READ COMMITTED transaction, there is no check, since the
>update can see the new value itself.
>
>Nested transactions or the ability to commit some data without leaving
>the current transaction won't get you want you want either, since
>you're really looking for per-row behavior, not per-transaction.
>
>> The middleware starts a transaction, and reads the version number, and
>> if it has been incremented since the initial transaction that read
>> the value and the version number, it then informs the user that
>> a new old value exists, and whether he wants to overwrite it.
>
>A way to do this using PostgreSQL's own row version data came up
>recently on this list.  Have a look at this post and the one following
>it:
>http://archives.postgresql.org/pgsql-general/2007-10/msg00503.php

this is the same as using your own version ids and incrementing them
within the application, which would leave the xmin within postgresql's
domain , and would also mean the application's sql is not tied
to postgresql.

The question was if there was a commit and continue transaction command,
would it work

e.g.

begin;
select x from t1 where id=2;
(store in variable x0 , display to user)
...(much later)
(user changes stored x, at client, now x1)
savepoint a;

answ = n;

do:
try:
    update t1 set x=x1 where id = 2;
    commit-and-continue;
catch:
     rollback to savepoint a;
     select x from t1 where id=2 ( store as x2)
     if (x2 <> x0) notify user "x has changed from x0 to x2, continue to write x1?"
     input answ;

while answ ==y;

In the first pass of the loop, the transaction hasn't read x a second
time so the transaction state for x is at x0,
if x has been changed by another transaction's commit, then the catch will
execute , and here x is selected again , and now the transaction state
is that x is at x2. if the user elects to loop the while loop,
then the second update will get through because the second select
has advanced the version id for x, so now it can be changed.

If the transaction was serializable , and another transaction has committed,
this would never work, because a commit would nullify the other
transactions write without this transaction ever having seen the other
transactions write, so this transaction would always be forced to rollback.

Re: atomic commit;begin for long running transactions , in combination with savepoint.

From

"Trevor Talbot"

Date:

15 October 2007, 12:29:04

On 10/15/07, Syan Tan <kittylitter@people.net.au> wrote:

> >In order to detect a change occurred, what you want is a SERIALIZABLE
> >transaction: you want the update to fail if the row it matches is no
> >longer the same version as your snapshot.  However, in order to read
> >the new value to decide if you want to update it anyway, you need to
> >leave your current snapshot.  As soon as you do that, ALL previously
> >read values lose the update checks that snapshot provided you.
>
> you read the old value at the beginning of the transaction, and
> you don't re-read it , assuming it is infrequently changed, so
> if someone updates it concurrently, when you try to write, then
> you detect the conflict, and rollback to the savepoint.

Transactions don't operate based on what you've read.  "UPDATE ...
WHERE ..." finds the row(s) to update using the WHERE clause right
now, not based on any previously-read values.  It does not know what
you've read before.

The only difference is in the data you _can_ read.  For the
SERIALIZABLE isolation level, that data was decided at the beginning
of the transaction.  A row that was updated by another transaction
will make the version that you can see effectively "read only", so
when the UPDATE tries to change it, there will be a transactional
conflict due to the isolation level.

> You DONT want a serializable transaction, because then you can't read
> the other committed value after rolling back to the savepoint.

Correct.  But the READ COMMITTED isolation level does not limit what
data you can see at the beginning of the transaction, so an UPDATE
will always find the latest version of a row.  There is no conflict as
far as the transaction is concerned.

> >A way to do this using PostgreSQL's own row version data came up
> >recently on this list.  Have a look at this post and the one following
> >it:
> >http://archives.postgresql.org/pgsql-general/2007-10/msg00503.php
>
> this is the same as using your own version ids and incrementing them
> within the application, which would leave the xmin within postgresql's
> domain , and would also mean the application's sql is not tied
> to postgresql.

Yes.  You were asking for a feature in PostgreSQL that doesn't match
standard transaction semantics though, so I figured you wouldn't mind
a PostgreSQL-specific option :)

Assuming READ COMMITTED isolation level:

> begin;
> select x from t1 where id=2;
> (store in variable x0 , display to user)
> ...(much later)
> (user changes stored x, at client, now x1)
> savepoint a;
>
> answ = n;
>
> do:
> try:
>     update t1 set x=x1 where id = 2;
>     commit-and-continue;
> catch:
>      rollback to savepoint a;
>      select x from t1 where id=2 ( store as x2)
>      if (x2 <> x0) notify user "x has changed from x0 to x2, continue to write x1?"
>      input answ;
>
> while answ ==y;
>
>
> In the first pass of the loop, the transaction hasn't read x a second
> time so the transaction state for x is at x0,

Transaction state is not based on what you've read; it doesn't know.

> if x has been changed by another transaction's commit, then the catch will
> execute ,

The UPDATE will find the latest version of the row.  The change made
by the other transaction is not a problem at this isolation level (you
can see it), so the UPDATE will simply proceed and change it anyway.
The catch block will never execute.

> If the transaction was serializable , and another transaction has committed,
> this would never work, because a commit would nullify the other
> transactions write without this transaction ever having seen the other
> transactions write, so this transaction would always be forced to rollback.

Exactly.  But without SERIALIZABLE isolation, any operations you
perform always see the other transaction's write, so there is never
any conflict.

Also keep in mind that MVCC is not the only way to implement
transactions; pure locking is more common in other databases.  In the
locking model, most transactions prevent others from writing until
after they are finished.  Rows simply can't have different versions
(and of course concurrent performance is awful).