Re: atomic commit;begin for long running transactions , in combination with savepoint. - Mailing list pgsql-general

From Trevor Talbot
Subject Re: atomic commit;begin for long running transactions , in combination with savepoint.
Date
Msg-id 90bce5730710171333p1a400a5cp290594eec526857a@mail.gmail.com
Whole thread Raw
In response to Re: atomic commit;begin for long running transactions , in combination with savepoint.  (Syan Tan <kittylitter@people.net.au>)
List pgsql-general
On 10/17/07, Syan Tan <kittylitter@people.net.au> wrote:

> my understanding was it used more resources than read committed
> because it keeps track of the version id of rows selected
> so far in a transaction, "transaction-level consistency",

Transaction-level consistency covers all the data in aggregate, it
doesn't track which individual rows you may have read.  They are
simply available or not in transaction-sized sets.

MVCC uses more resources than pure locking in general, since it needs
to apply versions to rows and store previous versions somewhere for as
long as they're needed.  PostgreSQL always uses the necessary
resources, since it's built on MVCC.  SQL Server is natively pure
locking with MVCC as a limited option, so the DBA can make a tradeoff
between concurrency and resources when appropriate.

> google found my reference, and the reference mentioned it was
> different from serializable.

SQL Server's SNAPSHOT and SERIALIZABLE both implement SQL standard
SERIALIZABLE-level isolation, in that they guarantee reads are
repeatable without phantoms.  There are subtle differences in other
guarantees they make, though, due to the implementation.  SQL Server
has had SERIALIZABLE as a pure locking implementation for a long time,
so it can't replace it without potentially breaking applications that
depend on the locking behavior.  Since the MVCC behavior is slightly
different, it had to be added under a new name.

There's a good example in the PostgreSQL docs that illustrates the
difference.  Consider this table:

     class | value
    -------+-------
         1 |    10
         1 |    20
         2 |   100
         2 |   200

Transactions A and B both begin at the same time, using MVCC rules for
SERIALIZABLE-level isolation.

Transaction A executes
    SELECT SUM(value) FROM table WHERE class = 1;
and gets 30 as the result, which it then inserts with the other class:
    INSERT INTO table VALUES (2, 30);

Transaction B executes
    SELECT SUM(value) FROM table WHERE class = 2;
and gets 300 as the result, which it also inserts with the other class:
    INSERT INTO table VALUES (1, 300);

When both transactions commit, the table will look like this:

     class | value
    -------+-------
         1 |    10
         1 |    20
         2 |   100
         2 |   200
         2 |    30
         1 |   300

Under MVCC rules this is perfectly fine.  Both transactions got a
frozen snapshot of the original 4 rows in the table.  Neither
transaction modified those rows, so there was no conflict.  MVCC
guarantees each transaction will not see new rows inserted by other
transactions, so there was no conflict there either.

But if you wanted SERIALIZABLE to mean "the same as if the
transactions were executed one after the other", that result is
clearly wrong.  If transaction A executed first, transaction B would
have gotten 330 as its answer, since it would have seen the row with
value 30 that A inserted.  Similar if the transactions went in the
other order.

If you repeat that situation using pure locking rules, one of the
transactions fails, forcing it to be tried again later in proper
order.  Most of the time this isn't necessary, because your data
doesn't have interesting interdependencies like that, but sometimes it
is the behavior you want.

SQL Server's SNAPSHOT isolation mode is identical to PostgreSQL's
SERIALIZABLE mode: they both execute the above scenario using MVCC
rules.

SQL Server's SERIALIZABLE isolation mode executes the above using pure
locking rules, something PostgreSQL doesn't support.  (PostgreSQL does
expose locking primitives so applications can get that behavior
anyway, but it's not automatic and supported as a transaction
isolation level.)

SQL Server also has an option to use snapshots under the READ
COMMITTED isolation level, to allow more concurrency at the expense of
storing row versions in the background.  It still doesn't track what
you actually read, just freezes a copy of the data while a single
query is being executed, instead of holding locks during that time.
Locks would prevent other transactions from modifying the same rows,
which could be a problem if the query takes a while to run.
PostgreSQL always takes snapshots.  For both databases, an UPDATE
always finds the latest version of a row.

I spent some time testing this with SQL Server Express, so as far as I
know the above is accurate.

pgsql-general by date:

Previous
From: Lothar Behrens
Date:
Subject: conditional alter table add ?
Next
From: "Ken Johansson"
Date:
Subject: downloading and installing postgreSQL