Thread: Re: atomic commit;begin for long running transactions , in combination with savepoint.
Re: atomic commit;begin for long running transactions , in combination with savepoint.
From
Syan Tan
Date:
my understanding was it used more resources than read committed because it keeps track of the version id of rows selected so far in a transaction, "transaction-level consistency", so it has the potential to do the xmin co-selecting , and checking, if it were a transaction isolation level in postgres. google found my reference, and the reference mentioned it was different from serializable. On Mon Oct 15 9:09 , "Trevor Talbot" sent: >On 10/15/07, Syan Tan wrote: > >> >Also keep in mind that MVCC is not the only way to implement >> >transactions; pure locking is more common in other databases. In the >> >locking model, most transactions prevent others from writing until >> >after they are finished. Rows simply can't have different versions >> >(and of course concurrent performance is awful). >> >> what about postgresql doing something like snapshot isolation level as per >> the enemy M$ ? > >SQL Server is normally a pure locking database; from what I can tell, >its snapshot isolation level adds a limited form of MVCC above that, >making its concurrent behavior closer to PostgreSQL's: >http://msdn2.microsoft.com/en-us/library/ms345124\(d=printer\).aspx
Re: atomic commit;begin for long running transactions , in combination with savepoint.
From
"Trevor Talbot"
Date:
On 10/17/07, Syan Tan <kittylitter@people.net.au> wrote: > my understanding was it used more resources than read committed > because it keeps track of the version id of rows selected > so far in a transaction, "transaction-level consistency", Transaction-level consistency covers all the data in aggregate, it doesn't track which individual rows you may have read. They are simply available or not in transaction-sized sets. MVCC uses more resources than pure locking in general, since it needs to apply versions to rows and store previous versions somewhere for as long as they're needed. PostgreSQL always uses the necessary resources, since it's built on MVCC. SQL Server is natively pure locking with MVCC as a limited option, so the DBA can make a tradeoff between concurrency and resources when appropriate. > google found my reference, and the reference mentioned it was > different from serializable. SQL Server's SNAPSHOT and SERIALIZABLE both implement SQL standard SERIALIZABLE-level isolation, in that they guarantee reads are repeatable without phantoms. There are subtle differences in other guarantees they make, though, due to the implementation. SQL Server has had SERIALIZABLE as a pure locking implementation for a long time, so it can't replace it without potentially breaking applications that depend on the locking behavior. Since the MVCC behavior is slightly different, it had to be added under a new name. There's a good example in the PostgreSQL docs that illustrates the difference. Consider this table: class | value -------+------- 1 | 10 1 | 20 2 | 100 2 | 200 Transactions A and B both begin at the same time, using MVCC rules for SERIALIZABLE-level isolation. Transaction A executes SELECT SUM(value) FROM table WHERE class = 1; and gets 30 as the result, which it then inserts with the other class: INSERT INTO table VALUES (2, 30); Transaction B executes SELECT SUM(value) FROM table WHERE class = 2; and gets 300 as the result, which it also inserts with the other class: INSERT INTO table VALUES (1, 300); When both transactions commit, the table will look like this: class | value -------+------- 1 | 10 1 | 20 2 | 100 2 | 200 2 | 30 1 | 300 Under MVCC rules this is perfectly fine. Both transactions got a frozen snapshot of the original 4 rows in the table. Neither transaction modified those rows, so there was no conflict. MVCC guarantees each transaction will not see new rows inserted by other transactions, so there was no conflict there either. But if you wanted SERIALIZABLE to mean "the same as if the transactions were executed one after the other", that result is clearly wrong. If transaction A executed first, transaction B would have gotten 330 as its answer, since it would have seen the row with value 30 that A inserted. Similar if the transactions went in the other order. If you repeat that situation using pure locking rules, one of the transactions fails, forcing it to be tried again later in proper order. Most of the time this isn't necessary, because your data doesn't have interesting interdependencies like that, but sometimes it is the behavior you want. SQL Server's SNAPSHOT isolation mode is identical to PostgreSQL's SERIALIZABLE mode: they both execute the above scenario using MVCC rules. SQL Server's SERIALIZABLE isolation mode executes the above using pure locking rules, something PostgreSQL doesn't support. (PostgreSQL does expose locking primitives so applications can get that behavior anyway, but it's not automatic and supported as a transaction isolation level.) SQL Server also has an option to use snapshots under the READ COMMITTED isolation level, to allow more concurrency at the expense of storing row versions in the background. It still doesn't track what you actually read, just freezes a copy of the data while a single query is being executed, instead of holding locks during that time. Locks would prevent other transactions from modifying the same rows, which could be a problem if the query takes a while to run. PostgreSQL always takes snapshots. For both databases, an UPDATE always finds the latest version of a row. I spent some time testing this with SQL Server Express, so as far as I know the above is accurate.