Re: Suggestion for optimization - Mailing list pgsql-hackers

From Jon Grov
Subject Re: Suggestion for optimization
Date
Msg-id jon-x9zofgxu6ku.fsf@einn.ifi.uio.no
Whole thread Raw
In response to Re: Suggestion for optimization  ("Dann Corbit" <DCorbit@connx.com>)
List pgsql-hackers
(Sorry that my previous post did not reach the pgsql-hackers list, I
sent it from the wrong address and was thus not considered a
subscriber)

"Dann Corbit" <DCorbit@connx.com> writes:

> But I am a bit puzzled.  How can a serializable transaction be
> performed in a MVCC system?  I realize the Oracle does it, and also
> Postgresql, but I can't picture how that would work.

In short, snapshot isolation implies this (assuming all transactions
are assigned a monotonously growing timestamp, and that all updates
create a new, unique version of the object):

- A transactions reads the most recent version of an object that is written by a transaction who were committed at it's
beginning.

- Two concurrent (i.e. at some point in time, they're both active) transactions need to have disjunct write sets.

This is probably best illustrated through an example:

Let x and y be columns in a table a. Initially, x = 20 and y = 5 in
the row where k = 1 (k is the primary key).

- Let T_1 be transaction as follows:
 BEGIN; SELECT x FROM a WHERE k = 1; SELECT y FROM a WHERE k = 1; END;

- Let T_2 be another transaction:
 BEGIN; UPDATE a SET x = 10 WHERE k = 1; UPDATE a SET y = 10 WHERE k = 1; END;

Then, if we have the following execution:

10 BEGIN; /* T_2 */

20 BEGIN; /* T_1 */

30 SELECT x FROM a WHERE k = 1; /* T_1 */

40 UPDATE a SET x = 10 WHERE k = 1; /* T_2 */

50 UPDATE y SET a = 10 WHERE k = 1; /* T_2 */

60 SELECT y FROM a WHERE k = 1; /* T_1 */

70 END; /* T_2 */

80 END; /* T_1 */

Clearly, this would not be serializable in a non-versioned
database. But under MVCC and snapshot isolation, T_1 reads from a
snapshot, i.e. it would not (under serialable isolation level, at
least) be allowed to read anything T_2 have written. The requirement
is that a transaction T reads values written by transactions committed
before T's beginning.

So T_1 would find that x = 20 and y = 5, just as it would if T_1 and
T_2 were executed serially with T_1 before T_2.

With two (or more) updating transactions, things get more complicated.

Assume we have the following to transactions:

T_1:
 BEGIN; UPDATE x SET x = x + 10; END;

T_2:
 BEGIN; UPDATE x SET x = x - 5; END;

and the following execution (assuming x = 20 before it starts):

10 BEGIN; /* T_1 */
20 BEGIN; /* T_2 */

30 UPDATE x SET x = x + 10; /* T_1 */

40 END; /* T_1 */

50 UPDATE x SET x = x - 5; /* T_2 */

60 END; /* T_2 *

Since a transaction is only allowed to read values written by
transactions committed before it's beginning, both T_1 and T_2 will
read x = 20. A transaction started after just this execution would
then read T_2's newly written value of x, that is 15, and line 40
would become a lost update. To avoid this, PostgreSQL offers two
solutions:

- Read committed isolation, where a statement on the form UPDATE <table> SET <column> = <column> + <value> is
considereda special case and T_2 is allowed to read T_1's value.
 

- Serializable isolation, where T_2 would have to be aborted.

If line 40 and 50 were swapped, T_2 would wait to see what happens to
T_1. If it's aborted, it can safely read x = 20 regardless of the
isolation level, if it's committed, the result would again depend on
the selected isolation level.

Hopefully, this illustrates the basic concepts. An interesting article
concerning the subtleties of this subject was posted by Tom Lane a
couple of days ago:


http://groups.google.com/groups?q=postgresql+regression&hl=en&ie=utf-8&oe=utf-8&scoring=d&selm=11556.1017860660%40sss.pgh.pa.us&rnum=4

In addition, this seems to be the "canonical paper" on snapshot
isolation:

http://citeseer.nj.nec.com/berenson95critique.html


--
Jon Grov, Linpro as



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: PQescapeBytea is not multibyte aware
Next
From: Peter Eisentraut
Date:
Subject: Debugging symbols by default