Thread: Re: atomic commit;begin for long running transactions , in combination with savepoint.

Re: atomic commit;begin for long running transactions , in combination with savepoint.

From

Syan Tan

Date:

17 October 2007, 12:34:31

my understanding was it used more resources than read committed
because it keeps track of the version id of rows selected
so far in a transaction, "transaction-level consistency",
so it has the potential to do the xmin co-selecting , and checking,
if it were a transaction isolation level in postgres.
google found my reference, and the reference mentioned it was
different from serializable.

On Mon Oct 15  9:09 , "Trevor Talbot"  sent:

>On 10/15/07, Syan Tan  wrote:
>
>> >Also keep in mind that MVCC is not the only way to implement
>> >transactions; pure locking is more common in other databases.  In the
>> >locking model, most transactions prevent others from writing until
>> >after they are finished.  Rows simply can't have different versions
>> >(and of course concurrent performance is awful).
>>
>> what about postgresql doing something like snapshot isolation level as per
>> the enemy M$ ?
>
>SQL Server is normally a pure locking database; from what I can tell,
>its snapshot isolation level adds a limited form of MVCC above that,
>making its concurrent behavior closer to PostgreSQL's:
>http://msdn2.microsoft.com/en-us/library/ms345124\(d=printer\).aspx

Re: atomic commit;begin for long running transactions , in combination with savepoint.

From

"Trevor Talbot"

Date:

17 October 2007, 20:33:28

On 10/17/07, Syan Tan <kittylitter@people.net.au> wrote:

> my understanding was it used more resources than read committed
> because it keeps track of the version id of rows selected
> so far in a transaction, "transaction-level consistency",

Transaction-level consistency covers all the data in aggregate, it
doesn't track which individual rows you may have read.  They are
simply available or not in transaction-sized sets.

MVCC uses more resources than pure locking in general, since it needs
to apply versions to rows and store previous versions somewhere for as
long as they're needed.  PostgreSQL always uses the necessary
resources, since it's built on MVCC.  SQL Server is natively pure
locking with MVCC as a limited option, so the DBA can make a tradeoff
between concurrency and resources when appropriate.

> google found my reference, and the reference mentioned it was
> different from serializable.

SQL Server's SNAPSHOT and SERIALIZABLE both implement SQL standard
SERIALIZABLE-level isolation, in that they guarantee reads are
repeatable without phantoms.  There are subtle differences in other
guarantees they make, though, due to the implementation.  SQL Server
has had SERIALIZABLE as a pure locking implementation for a long time,
so it can't replace it without potentially breaking applications that
depend on the locking behavior.  Since the MVCC behavior is slightly
different, it had to be added under a new name.

There's a good example in the PostgreSQL docs that illustrates the
difference.  Consider this table:

     class | value
    -------+-------
         1 |    10
         1 |    20
         2 |   100
         2 |   200

Transactions A and B both begin at the same time, using MVCC rules for
SERIALIZABLE-level isolation.

Transaction A executes
    SELECT SUM(value) FROM table WHERE class = 1;
and gets 30 as the result, which it then inserts with the other class:
    INSERT INTO table VALUES (2, 30);

Transaction B executes
    SELECT SUM(value) FROM table WHERE class = 2;
and gets 300 as the result, which it also inserts with the other class:
    INSERT INTO table VALUES (1, 300);

When both transactions commit, the table will look like this:

     class | value
    -------+-------
         1 |    10
         1 |    20
         2 |   100
         2 |   200
         2 |    30
         1 |   300

Under MVCC rules this is perfectly fine.  Both transactions got a
frozen snapshot of the original 4 rows in the table.  Neither
transaction modified those rows, so there was no conflict.  MVCC
guarantees each transaction will not see new rows inserted by other
transactions, so there was no conflict there either.

But if you wanted SERIALIZABLE to mean "the same as if the
transactions were executed one after the other", that result is
clearly wrong.  If transaction A executed first, transaction B would
have gotten 330 as its answer, since it would have seen the row with
value 30 that A inserted.  Similar if the transactions went in the
other order.

If you repeat that situation using pure locking rules, one of the
transactions fails, forcing it to be tried again later in proper
order.  Most of the time this isn't necessary, because your data
doesn't have interesting interdependencies like that, but sometimes it
is the behavior you want.

SQL Server's SNAPSHOT isolation mode is identical to PostgreSQL's
SERIALIZABLE mode: they both execute the above scenario using MVCC
rules.

SQL Server's SERIALIZABLE isolation mode executes the above using pure
locking rules, something PostgreSQL doesn't support.  (PostgreSQL does
expose locking primitives so applications can get that behavior
anyway, but it's not automatic and supported as a transaction
isolation level.)

SQL Server also has an option to use snapshots under the READ
COMMITTED isolation level, to allow more concurrency at the expense of
storing row versions in the background.  It still doesn't track what
you actually read, just freezes a copy of the data while a single
query is being executed, instead of holding locks during that time.
Locks would prevent other transactions from modifying the same rows,
which could be a problem if the query takes a while to run.
PostgreSQL always takes snapshots.  For both databases, an UPDATE
always finds the latest version of a row.

I spent some time testing this with SQL Server Express, so as far as I
know the above is accurate.