Thread: SSI predicate locking on heap -- tuple or row?

SSI predicate locking on heap -- tuple or row?

From

"Kevin Grittner"

Date:

21 May 2011, 17:09:23

As background, for most of SSI development I had a TODO comment in the
source which was basically around the question of whether a predicate
lock on a visible heap tuple should propagate to later versions of the
row as long as the original lock was material. In early February Heikki
(quite reasonably) insisted that I resolve that and either add code to
do so or a comment about why it wasn't necessary. After looking at it
for many hours without getting to a logical proof, I thought I had a
proof by example:
http://archives.postgresql.org/pgsql-hackers/2011-02/msg00325.php
We added code for this, but it has been problematic; probably over half
the SSI bugs which have been found since SSI went into the code tree
have been related to this, and Robert Haas has just found a couple more.In discussing how to address this with Jeff
Davis(over beers at

PGCon), he provided good advice about how to properly fix these issues,
but posed some very good questions about whether it was really
necessary. Rather than fix this aspect of the code, we might be able to
eliminate it, which would reduce the code size and some overhead. Since
this code is more fragile than the rest of SSI, this is particularly
appealing -- my favorite way to deal with fragile code is to remove it.

I went back to the example which persuaded me and took another look. On
review I see that this didn't prove the point because there was a
dangerous structure with T1 as a pivot which should have caused SSI to
break the cycle. Since it didn't, either I got careless in my testing
methodology (which I will recheck when I get back to Wisconsin) or there
was a bug -- but either way, this example can't prove that the predicate
locks need to follow the row to new versions.
I haven't been able to come up with an example where it actually *is*
needed, but failure to come up with an example where it is needed
doesn't prove that it isn't needed. Here's my attempt at a logical
proof of whether it is needed. It would be great if anyone with a grasp
of SSI concepts could confirm its validity or shoot it down. (Hopefully
Friday's presentation expanded the number of those who can do so.)
The issue can be rephrased to this: If transaction T1 reads a row (thus
acquiring a predicate lock on it) and a second transaction T2 updates
that row, must a third transaction T3 which updates the new version of
the row have a rw-conflict in from T1?
Now, the first thing which jumps out is the timing requirements -- T3
must overlap T1 or there is no conflict (and therefore no need to
replicate the predicate locks), but T3 must *not* overlap T2 or there
would be ww-conflict and one of them would roll back. If T2 committed
before T1 acquired its snapshot, T1 would have gotten its predicate lock
on the second version of the row, so for a situation where this could
possibly matter, the following actual timings must exist (read "starts"
as meaning "acquires a snapshot"):
T1 and T2 start in either order.
T1 reads the row and T2 updates the row in either order.
T2 commits.
T3 starts.
T3 updates the row.
So far, using broken lines for rw-conflicts and solid for
wr-dependencies, we have this for apparent order of execution:
T1 - - -> T2 -----> T3
Not on the slides for the presentation, but briefly mentioned, is that
Alan Fekete and others have proven that serialization anomalies can only
occur if the transaction which appears on the rw-conflict *out* side of
the pivot (the transaction which appears to have executed third)
actually commits first. T2 committed before T1, so there can't be a
cycle involving a rw-conflict *in* to T1 because that would complete the
dangerous structure -- unless it commits before T2 or is read only and
gets its snapshot before the commit of T2 (another special case which we
left out of the presentation in the interest of time).
Since T3 must get its snapshot after the commit of T2, if it develops a
rw-conflict with T1 there will be an SSI serialization failure without
needing the lock propagation. So now the question breaks down to
whether we can have other transactions in the mix which, given the
above, cause T3 to appear to have executed before T1.
Since T3 cannot have committed before T1 (given what we know about the
actual timings required), that has to involve a rw-conflict somewhere in
the graph. Since a wr-dependency is caused by the writer actually
committing before its work is read by another transaction, causing
apparent order of execution at that point to match actual serial order
of execution, such transactions would be benign in the transaction graph
-- they might exist in a problem graph but would not help to cause a
later-starting transaction to appear to have executed first. We can
look just a rw-conflicts to cause the apparent order of execution to
loop back in time.
Since the time-arrow for rw-conflict points in the direction of the
write, we can ignore read only transactions. So to complete the cycle
back to T1, the transaction T0 with the rw-conflict to T1 must have
committed before T2 committed. This means that it can't overlap with
T3, because it started after the commit of T2.
So the question is now whether T3 can have a rw-conflict (or chain of
conflicts) to T0.The timings required to make it necessary to replicate predicate locks
to later versions of the row now are:
T0, T1, and T2 start in any order.
T1 reads the row and T2 updates the row in either order; also T1 updates
some other data which conflicts with a T0 read, in any order.
T0 commits.
T2 commits.
T3 starts.
T3 updates the row.

Apparent order of execution is now:
T0 - - -> T1 - - -> T2 -----> T3
We need at least one more transaction to complete the cycle here. For
only one to do it, it must overlap both T0 and T3, it must do some read
which conflicts with a write of T0 and some write which conflicts with a
read of T3.

Listing the timings at this point would be pretty chaotic -- T4 starts
and develops a rw-conflict with T0 anytime before T0 commits, and
develops a rw-conflict with T3 at any point after T3 starts.
Apparent order of execution is now:
T0 - - -> T1 - - -> T2 -----> T3 - - -> T4 - - -> T0
But wait -- T4 is now a pivot with T0 committing first and T3 is not
read only. That part of the cycle is a dangerous structure and one of
those transactions will be rolled back.
So far we have now proven that you can't cause a problem with only five
transactions.
The question is now whether T4 can be replaced with multiple
transactions forming a rw-conflict without one of them committing at a
time which would create a dangerous structure and break the cycle.
[Anyone who has followed along this far has earned my undying
gratitude.]
Since the chain of transactions has to overlap T0 and T3, I don't see
how that can happen. We established that T0 has to commit before T3 can
start, so the chain will ultimately have to get to that T0 commit.
I don't want to start ripping out the apparently useless code without
someone checking my logic here. One big gaff in this area is quite
enough for me. :-/ Anyone?
-Kevin

Re: SSI predicate locking on heap -- tuple or row?

From

Pavan Deolasee

Date:

21 May 2011, 17:45:47

On Sat, May 21, 2011 at 4:09 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

It would be great if anyone with a grasp
of SSI concepts could confirm its validity or shoot it down. (Hopefully
Friday's presentation expanded the number of those who can do so.)

As a first step, it would be great if you can upload the slides on the conference website. To expect that the attendees would have understood the nitty-gritties of SSI just listening to the presentation is so unhuman :-)

Thanks,

Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: SSI predicate locking on heap -- tuple or row?

From

"Kevin Grittner"

Date:

21 May 2011, 18:07:08

Pavan Deolasee  wrote:
> As a first step, it would be great if you can upload the slides on
> the conference website. To expect that the attendees would have
> understood the nitty-gritties of SSI just listening to the
> presentation is so unhuman :-)
I'm sure Dan will be doing that soon; meanwhile, maybe this page
will help:
http://wiki.postgresql.org/wiki/Serializable
-Kevin

Re: SSI predicate locking on heap -- tuple or row?

From

Dan Ports

Date:

21 May 2011, 21:10:07

On Sat, May 21, 2011 at 04:45:15PM -0400, Pavan Deolasee wrote:
> As a first step, it would be great if you can upload the slides on the
> conference website. To expect that the attendees would have understood the
> nitty-gritties of SSI just listening to the presentation is so unhuman :-)

I just posted them at
http://drkp.net/drkp/papers/ssi-pgcon11-slides.pdf

...and they'll be linked from the Serializable wiki page as soon as I
remember how to edit it. :-)

Dan

-- 
Dan R. K. Ports              MIT CSAIL                http://drkp.net/

Re: SSI predicate locking on heap -- tuple or row?

From

Robert Haas

Date:

21 May 2011, 21:36:26

On Sat, May 21, 2011 at 4:09 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> [Anyone who has followed along this far has earned my undying
> gratitude.]
>
> Since the chain of transactions has to overlap T0 and T3, I don't see
> how that can happen.  We established that T0 has to commit before T3 can
> start, so the chain will ultimately have to get to that T0 commit.
>
> I don't want to start ripping out the apparently useless code without
> someone checking my logic here.  One big gaff in this area is quite
> enough for me.  :-/  Anyone?

How is an UPDATE different from a DELETE and an INSERT?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: SSI predicate locking on heap -- tuple or row?

From

"Kevin Grittner"

Date:

21 May 2011, 21:50:32

Robert Haas  wrote:
> How is an UPDATE different from a DELETE and an INSERT?
That's the question Jeff asked which caused us to revisit this.
There are two answers -- conceptual and implementation.
Conceptually, an updated row is the same row, and a row inserted after a
delete is a new row.  Note that READ COMMITTED doesn't treat them the
same on a write conflict.  To give a practical example, police
departments in Wisconsin typically reassign a badge number to a new
officer after an existing officer leaves.  Updating an officer record
keyed by badge number (say, with a new address or a name change) would
be qualitatively different from deleting an old officer record and
inserting a new one for a different person now getting the badge number.(OK, so this is somewhat artificial, because
theytrack who had the
 
number in what temporal ranges, but you get the idea.)
In the implementation the only difference between an UPDATE in a table
and a DELETE and INSERT in the same table in the same transaction
(besides concurrency handling) is the ctid linkage, at least as far as I
can see.
So I think that you can't just treat the two things the same in SSI just
because the PostgreSQL implementation details make them similar; but I
think that you can treat the two things the same for the reasons I
worked out at the start of the thread.
-Kevin

Re: SSI predicate locking on heap -- tuple or row?

From

Robert Haas

Date:

21 May 2011, 22:58:12

On Sat, May 21, 2011 at 8:50 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas  wrote:
>> How is an UPDATE different from a DELETE and an INSERT?
>
> That's the question Jeff asked which caused us to revisit this.
>
> There are two answers -- conceptual and implementation.
>
> Conceptually, an updated row is the same row, and a row inserted after a
> delete is a new row.  Note that READ COMMITTED doesn't treat them the
> same on a write conflict.  To give a practical example, police
> departments in Wisconsin typically reassign a badge number to a new
> officer after an existing officer leaves.  Updating an officer record
> keyed by badge number (say, with a new address or a name change) would
> be qualitatively different from deleting an old officer record and
> inserting a new one for a different person now getting the badge number.
>  (OK, so this is somewhat artificial, because they track who had the
> number in what temporal ranges, but you get the idea.)
>
> In the implementation the only difference between an UPDATE in a table
> and a DELETE and INSERT in the same table in the same transaction
> (besides concurrency handling) is the ctid linkage, at least as far as I
> can see.

I think the implementation is what matters here.  I understand that
there are certain situations in which the user might choose to UPDATE
a row and other situations in which they might choose to DELETE and
then INSERT: but the user's intent is basically irrelevant.  If the
system treats those operations in basically the same way, then it
shouldn't be necessary to follow the CTID chain in one case given that
there is no CTID chain in the other case.  Or to put that another way,
if it is necessary to follow the CTID chain, then we should be able to
articulate a reason for that necessity -- something that is materially
different in the UPDATE case.  Otherwise, we're just following the
chain "because it's there".

It seems to me that we can actually state with some degree of
precision what that "material difference" would need to be.  The goal
of SSI is to prevent serialization anomalies that would not be
prevented by snapshot isolation.  Let's suppose that it successfully
does that in the DELETE/INSERT case.  Suppose further that we change
SSI so that it handles the UPDATE case in the same way that it handles
the DELETE/INSERT case.  This change will be incorrect only if there
is a serialization anomaly that snapshot isolation *would have
prevented* in the DELETE/INSERT case that *it does not prevent* in the
update case.  In other words, if SSI needs to be more rigorous in the
UPDATE case, it can only be because snapshot isolation is less
rigorous in that case, and the additional rigor that SSI must apply
there must be exactly equal to whatever snapshot isolation isn't
picking up (as compared with the DELETE/INSERT case).

Does that make any sense?  It seems logical to me, but IJWH.

> So I think that you can't just treat the two things the same in SSI just
> because the PostgreSQL implementation details make them similar; but I
> think that you can treat the two things the same for the reasons I
> worked out at the start of the thread.

Your argument seems reasonable to me; but it would be nice if we could
find a simpler one, because simpler arguments are less likely to be
incorrect.  :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: SSI predicate locking on heap -- tuple or row?

From

"Kevin Grittner"

Date:

22 May 2011, 23:26:45

Robert Haas  wrote:
> I think the implementation is what matters here. I understand that
> there are certain situations in which the user might choose to
> UPDATE a row and other situations in which they might choose to
> DELETE and then INSERT: but the user's intent is basically
> irrelevant.
We don't consider it irrelevant when we decide which triggers to
fire.  We do have update triggers distinct from the insert and delete
triggers.  We also consider it relevant when dealing with a write
conflict in READ COMMITTED mode.  Those facts make me very reluctant
to move based on a simple assertion that it doesn't matter.
> If the system treats those operations in basically the same way,
> then it shouldn't be necessary to follow the CTID chain in one case
> given that there is no CTID chain in the other case. Or to put that
> another way, if it is necessary to follow the CTID chain, then we
> should be able to articulate a reason for that necessity --
> something that is materially different in the UPDATE case.
There is a wealth of research on which SSI is based.  I've read many
(although by no means *all*) of the papers on the topic, and all of
the ones I've read have been based around the concept of a row which
can be updated and retain its identity.  I trust the research, but I
think it is incumbent on us to prove, rather than just assert, that
it can be applied just as well to a row-version tuple.  I sure hope
it can, because we can have faster, leaner, less fragile code that
way.  I've attempted to write out a proof; although I won't trust
that without further review -- by me and by others.
> Otherwise, we're just following the chain "because it's there".
Why would you say it *is* there?
> It seems to me that we can actually state with some degree of
> precision what that "material difference" would need to be. The
> goal of SSI is to prevent serialization anomalies that would not be
> prevented by snapshot isolation. Let's suppose that it successfully
> does that in the DELETE/INSERT case. Suppose further that we change
> SSI so that it handles the UPDATE case in the same way that it
> handles the DELETE/INSERT case. This change will be incorrect only
> if there is a serialization anomaly that snapshot isolation *would
> have prevented* in the DELETE/INSERT case that *it does not
> prevent* in the update case.
I don't see that -- it could be correct because of the conceptual
difference between an UPDATE and a DELETE/INSERT pair.
> In other words, if SSI needs to be more rigorous in the UPDATE
> case, it can only be because snapshot isolation is less rigorous in
> that case, and the additional rigor that SSI must apply there must
> be exactly equal to whatever snapshot isolation isn't picking up
> (as compared with the DELETE/INSERT case).
>
> Does that make any sense? It seems logical to me, but IJWH.
I've always loved logic, but one of the most intriguing aspects is
identifying the unproven assumptions in an argument.  You have a
built-in premise that there is no significant difference between an
UPDATE and a DELETE/INSERT pair, in which case the logic is flawless
which is leading you to the conclusion that a lock on the visible
tuple is enough.  I'm not confident in that premise, so the simple
argument doesn't persuade me.
> Your argument seems reasonable to me;
Thanks much for fighting through it.  It is heartening that you
couldn't punch any holes in it.
> but it would be nice if we could find a simpler one, because
> simpler arguments are less likely to be incorrect. :-)
All generalizations are false.  :-)
-Kevin

Re: SSI predicate locking on heap -- tuple or row?

From

Aidan Van Dyk

Date:

23 May 2011, 10:59:15

On Mon, May 23, 2011 at 2:26 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

> I don't see that -- it could be correct because of the conceptual
> difference between an UPDATE and a DELETE/INSERT pair.
>
>> In other words, if SSI needs to be more rigorous in the UPDATE
>> case, it can only be because snapshot isolation is less rigorous in
>> that case, and the additional rigor that SSI must apply there must
>> be exactly equal to whatever snapshot isolation isn't picking up
>> (as compared with the DELETE/INSERT case).
>>
>> Does that make any sense? It seems logical to me, but IJWH.
>
> I've always loved logic, but one of the most intriguing aspects is
> identifying the unproven assumptions in an argument.  You have a
> built-in premise that there is no significant difference between an
> UPDATE and a DELETE/INSERT pair, in which case the logic is flawless
> which is leading you to the conclusion that a lock on the visible
> tuple is enough.  I'm not confident in that premise, so the simple
> argument doesn't persuade me.

I *think* (but am in no means familiar with SSI, or an expert on the
problems it's trying to solve), that Robert was only arguing that SSI
is only "relevant" to solve problems that the non SSI wouldn't catch.
And the "sameness" of UPDATE vs DELETE+INSERT, is simply because if
you can only see the data as it was *completely before* or *completely
after* a transaction (not as it was after the delete, before the
insert), then to you, it doesn't matter if the transaction did an
UPDATE, or an DELETE+INSERT.  All you see is either $OLDROW, or
$NEWROW, depending if you see it before, or after, not the
transformation from $OLDROW to $NEWROW.

So, if SSI conflicts something on the UPDATE case, it would necessrily
have to conflict the DELETE+INSERT case as well, and vice-versa.

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: SSI predicate locking on heap -- tuple or row?

From

Tom Lane

Date:

23 May 2011, 11:21:58

Aidan Van Dyk <aidan@highrise.ca> writes:
> So, if SSI conflicts something on the UPDATE case, it would necessrily
> have to conflict the DELETE+INSERT case as well, and vice-versa.

This argument is fundamentally bogus, because an UPDATE is not the same
thing as DELETE followed by INSERT.  In the former case the new tuple
will have a ctid link from the old one; in the latter case not.  And
that will affect the behavior of subsequent operations.  Another
user-visible difference is that if the table has OIDs, the latter case
will (usually) give the new tuple a different OID.  Or if you don't like
OIDs, think about a serial column.  Or an ON INSERT trigger with
side-effects.

It may be that the result Kevin seeks to prove is in fact true, but an
argument that requires the assumption that UPDATE is indistinguishable
from DELETE+INSERT isn't going to persuade me, because I don't have any
trouble whatsoever in distinguishing them.
        regards, tom lane

Re: SSI predicate locking on heap -- tuple or row?

From

Robert Haas

Date:

23 May 2011, 11:36:33

On Mon, May 23, 2011 at 10:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Aidan Van Dyk <aidan@highrise.ca> writes:
>> So, if SSI conflicts something on the UPDATE case, it would necessrily
>> have to conflict the DELETE+INSERT case as well, and vice-versa.
>
> This argument is fundamentally bogus, because an UPDATE is not the same
> thing as DELETE followed by INSERT.  In the former case the new tuple
> will have a ctid link from the old one; in the latter case not.  And
> that will affect the behavior of subsequent operations.

Right.  The point I was driving at is - in what way will that affect
the behavior of subsequent operations?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: SSI predicate locking on heap -- tuple or row?

From

"Kevin Grittner"

Date:

23 May 2011, 12:38:23

Robert Haas <robertmhaas@gmail.com> wrote:
> The point I was driving at is - in what way will that affect the
> behavior of subsequent operations?
You haven't answered why you think it should matter for OID or for
write conflict on READ COMMITTED but not here.  The logical concept
of a row which is modified is a meaningful one regardless of any
particular database's internal implementation.  The fact that the
proofs of SSI techniques use row-oriented terminology shouldn't be
simply ignored.  The fact that the two prototype implementations in
the papers on the topic used databases with "update in place with a
rollback log" for their MVCC implementations, and took their
predicate locks on the "in place" row, reinforce that.
No flaws jumped out at me in a review of the longer logical argument
after sleeping on it, Robert said it looked good to him, and Dan
said he would look at it soon.  If it looks good to Dan, and nobody
else pokes a hole in the logic, I'll have enough confidence to
proceed.
-Kevin

Re: SSI predicate locking on heap -- tuple or row?

From

Dan Ports

Date:

23 May 2011, 15:32:42

On Sat, May 21, 2011 at 03:09:12PM -0500, Kevin Grittner wrote:
> I went back to the example which persuaded me and took another look.  On
> review I see that this didn't prove the point because there was a
> dangerous structure with T1 as a pivot which should have caused SSI to
> break the cycle.  Since it didn't, either I got careless in my testing
> methodology (which I will recheck when I get back to Wisconsin) or there
> was a bug -- but either way, this example can't prove that the predicate
> locks need to follow the row to new versions.

I'm still working through the more general question, but I wanted to
see what was going on with this example. It appears there's a bug,
because this doesn't cause a rollback without the version linking.

Specifically, the problem is a missing check in
OnConflict_CheckForSerializationFailure. We check whether the conflict
has caused the writer to become a pivot, but only if neither the reader
or writer is committed. Why is that last condition there? In this case,
the reader (T4) has committed but the writer (T1) hasn't.

It would be worth making sure there aren't any other cases we're
missing here, although I don't see any right now.

Dan

-- 
Dan R. K. Ports              MIT CSAIL                http://drkp.net/

Re: SSI predicate locking on heap -- tuple or row?

From

"Kevin Grittner"

Date:

23 May 2011, 17:38:43

Dan Ports <drkp@csail.mit.edu> wrote:
> On Sat, May 21, 2011 at 03:09:12PM -0500, Kevin Grittner wrote:
>> I went back to the example which persuaded me and took another
>> look.  On review I see that this didn't prove the point because
>> there was a dangerous structure with T1 as a pivot which should
>> have caused SSI to break the cycle.  Since it didn't, either I
>> got careless in my testing methodology (which I will recheck when
>> I get back to Wisconsin) or there was a bug -- but either way,
>> this example can't prove that the predicate locks need to follow
>> the row to new versions.
> 
> I'm still working through the more general question, but I wanted
> to see what was going on with this example. It appears there's a
> bug, because this doesn't cause a rollback without the version
> linking.
> 
> Specifically, the problem is a missing check in
> OnConflict_CheckForSerializationFailure. We check whether the
> conflict has caused the writer to become a pivot, but only if
> neither the reader or writer is committed. Why is that last
> condition there?
Because Fekete et al proved that the pivot doesn't cause an anomaly
unless the transaction read by the pivot commits before the pivot or
the transaction reading the pivot.  The problem has to be somewhere
that fails to detect the T1 pivot.
> In this case, the reader (T4) has committed but the writer (T1)
> hasn't.
The T4 commit-first makes this safe.  Everything should be OK until
the session 1 activity at the end, which makes T1 a pivot with T2
committing first.  I believe we should get the error on this
statement:
update t set txt = 'a' where id = 1;
I was too wiped out from travel to look at it last night, and can't
spend any time on it during business hours today, but after 18:00
CDT today this has my attention.
-Kevin

Re: SSI predicate locking on heap -- tuple or row?

From

"Kevin Grittner"

Date:

23 May 2011, 18:41:46

Dan Ports <drkp@csail.mit.edu> wrote:
> Specifically, the problem is a missing check in
> OnConflict_CheckForSerializationFailure. We check whether the
> conflict has caused the writer to become a pivot, but only if
> neither the reader or writer is committed. Why is that last
> condition there? In this case, the reader (T4) has committed but
> the writer (T1) hasn't.
OK, I misread your post -- you are looking at T1 as the pivot, and
that test *is* the problem.  When T1 becomes the pivot the reader
(T4) is committed, but it committed *after* T2.  I can submit a
patch for that this evening, after testing to confirm that if finds
the T1 pivot, unless you want to get it.
Sorry for the misunderstanding.  I'm sneaking peeks at this during
compiles of other stuff....
-Kevin

Re: SSI predicate locking on heap -- tuple or row?

From

Dan Ports

Date:

23 May 2011, 20:11:38

I have thought about this quite a bit and am fairly certain we do not need
to track this linkage between row versions. One strong hint to this
is that all the work I've seen on multiversion serializability
theory defines a rw-conflict to be one transaction reading an object
and the other writing *the next version* of the same object.

But maybe that doesn't answer the question conclusively -- after all,
the differences between an object, a tuple, and a row are subtle. So
see the argument below:
[I think it's similar to the argument you were making.]

On Sat, May 21, 2011 at 03:09:12PM -0500, Kevin Grittner wrote:
> The issue can be rephrased to this: If transaction T1 reads a row (thus
> acquiring a predicate lock on it) and a second transaction T2 updates
> that row, must a third transaction T3 which updates the new version of
> the row have a rw-conflict in from T1?

OK, that's a good way to phrase the question. Does it matter whether
this edge T1 -> T3 is there?

If T1 has a conflict in, it certainly doesn't. Adding the edge T1 -> T3
would create a dangerous structure, but we already had one from the
edge T1 -> T2, so we would have aborted something anyway.

Now let's consider the case where T1 doesn't have a conflict in. If
that's the case, for this edge T1 -> T3 to make a difference, T3 must
have a rw-conflict out that induces a cycle in the dependency graph,
i.e. a conflict out to some transaction preceding T1 in the serial
order. (A conflict out to T1 would work too, but that would mean T1 has
a conflict in and we would have rolled back.)

So now we're trying to figure out if there can be an rw-conflict edge
T3 -> T0, where T0 is some transaction that precedes T1. For T0 to
precede T1, there has to be has to be some edge, or sequence of edges,
from T0 to T1. At least the last edge has to be a wr-dependency or
ww-dependency rather than a rw-conflict, because T1 doesn't have a
rw-conflict in. And that gives us enough information about the order of
transactions to see that T3 can't have a rw-dependency to T0:- T0 committed before T1 started (the wr/ww-dependency
impliesthis)- T1 started before T2 committed (the T1->T2 rw-conflict implies this)- T2 committed before T3 started
(otherwise,T3 would be aborted because of an update conflict)

That means T0 committed before T3 started, and therefore there can't be
a rw-conflict from T3 to T0.

In both cases, we didn't need the T1 -> T3 edge.

Does that make sense to you? Unless anyone can poke a hole in that
reasoning, I think we can get rid of the code to handle later versions,
which would make me happy for many reasons. It will not be missed.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

Re: SSI predicate locking on heap -- tuple or row?

From

"Kevin Grittner"

Date:

23 May 2011, 20:50:49

Dan Ports <drkp@csail.mit.edu> wrote:
> [I think it's similar to the argument you were making.]
Similar, but you found a simpler, shorter way to the end, which
should make Robert happy.  ;-)
> On Sat, May 21, 2011 at 03:09:12PM -0500, Kevin Grittner wrote:
>> The issue can be rephrased to this: If transaction T1 reads a row
>> (thus acquiring a predicate lock on it) and a second transaction
>> T2 updates that row, must a third transaction T3 which updates
>> the new version of the row have a rw-conflict in from T1?
> 
> OK, that's a good way to phrase the question. Does it matter
> whether this edge T1 -> T3 is there?
> 
> If T1 has a conflict in, it certainly doesn't. Adding the edge T1
> -> T3 would create a dangerous structure, but we already had one
> from the edge T1 -> T2, so we would have aborted something anyway.
I had to pause a moment on that because you didn't mention the
"early enough commit" aspects; but certainly if T3 commits early
enough to cause a problem then T2 does, so this is solid.
> Now let's consider the case where T1 doesn't have a conflict in.
> If that's the case, for this edge T1 -> T3 to make a difference,
> T3 must have a rw-conflict out that induces a cycle in the
> dependency graph, i.e. a conflict out to some transaction
> preceding T1 in the serial order. (A conflict out to T1 would work
> too, but that would mean T1 has a conflict in and we would have
> rolled back.)
Agreed.
> So now we're trying to figure out if there can be an rw-conflict
> edge T3 -> T0, where T0 is some transaction that precedes T1. For
> T0 to precede T1, there has to be has to be some edge, or sequence
> of edges, from T0 to T1. At least the last edge has to be a
> wr-dependency or ww-dependency rather than a rw-conflict, because
> T1 doesn't have a rw-conflict in.
> 
> And that gives us enough information about the order of
> transactions to see that T3 can't have a rw-dependency to T0:
>  - T0 committed before T1 started (the wr/ww-dependency implies
>                                    this)
>  - T1 started before T2 committed (the T1->T2 rw-conflict implies
>                                    this)
>  - T2 committed before T3 started (otherwise, T3 would be aborted
>                                    because of an update conflict)
> 
> That means T0 committed before T3 started,
[scribbles diagrams]  Yep.
> and therefore there can't be a rw-conflict from T3 to T0.
No overlap means no rw-conflict; so that has to be true.
> In both cases, we didn't need the T1 -> T3 edge.
> 
> 
> Does that make sense to you?
Makes sense to me.  Like the proof I offered, you have shown that
there is no cycle which can develop with the locks copied which
isn't there anyway if we don't copy the locks.
> Unless anyone can poke a hole in that reasoning, I think we can
> get rid of the code to handle later versions,
Actually, we now have two independent proofs which don't have much
in common beyond the initial statement of the problem.  Someone
would need to show a flaw in that premise or punch holes in both
proofs.  I think we can get by with just the shorter proof (yours)
in the README file, though.
-Kevin

Re: SSI predicate locking on heap -- tuple or row?

From

"Kevin Grittner"

Date:

24 May 2011, 06:19:29

"Kevin Grittner"  wrote:
> Dan Ports  wrote:
>> Does that make sense to you?
> 
> Makes sense to me. Like the proof I offered, you have shown that
> there is no cycle which can develop with the locks copied which
> isn't there anyway if we don't copy the locks.
I woke up with the nagging thought that while the above is completely
accurate, it deserves some slight elaboration. These proofs show that
there is no legitimate cycle which could cause an anomaly which the
move from row-based to tuple-based logic will miss.  They don't prove
that the change will generate all the same serialization failures;
and in fact, some false positives are eliminated by the change. 
That's a good thing.  In addition to the benefits mentioned in prior
posts, there will be a reduction in the rate of rollbacks (in
particular corner cases) from what people see in beta1 without a loss
of correctness.
-Kevin

Re: SSI predicate locking on heap -- tuple or row?

From

Dan Ports

Date:

24 May 2011, 21:26:04

On Tue, May 24, 2011 at 04:18:37AM -0500, Kevin Grittner wrote:
> These proofs show that
> there is no legitimate cycle which could cause an anomaly which the
> move from row-based to tuple-based logic will miss.  They don't prove
> that the change will generate all the same serialization failures;
> and in fact, some false positives are eliminated by the change. 

Yes, that's correct. That's related to the part in the proof where I
claimed T3 couldn't have a conflict out *to some transaction T0 that
precedes T1*.

I originally tried to show that T3 couldn't have any conflicts out that
T2 didn't have, which would mean we got the same set of serialization
failures, but that's not true. In fact, it's not too hard to come up
with an example where there would be a serialization failure with the
row version links, but not without. However, because the rw-conflict
can't be pointing to a transaction that precedes T1 in the serial
order, it won't create a cycle. In other words, there are serialization
failures that won't happen anymore, but they were false positives.

Dan

-- 
Dan R. K. Ports              MIT CSAIL                http://drkp.net/

Re: SSI predicate locking on heap -- tuple or row?

From

"Kevin Grittner"

Date:

26 May 2011, 00:19:49

> Dan Ports  wrote:
> On Tue, May 24, 2011 at 04:18:37AM -0500, Kevin Grittner wrote:

>> These proofs show that there is no legitimate cycle which could
>> cause an anomaly which the move from row-based to tuple-based
>> logic will miss. They don't prove that the change will generate
>> all the same serialization failures; and in fact, some false
>> positives are eliminated by the change.
>
> Yes, that's correct. That's related to the part in the proof where
> I claimed T3 couldn't have a conflict out *to some transaction T0
> that precedes T1*.
>
> I originally tried to show that T3 couldn't have any conflicts out
> that T2 didn't have, which would mean we got the same set of
> serialization failures, but that's not true. In fact, it's not too
> hard to come up with an example where there would be a
> serialization failure with the row version links, but not without.
> However, because the rw-conflict can't be pointing to a transaction
> that precedes T1 in the serial order, it won't create a cycle. In
> other words, there are serialization failures that won't happen
> anymore, but they were false positives.

Dan and I went around a couple times chasing down all code, comment,
and patch changes needed, resulting in the attached patch.  We found
and fixed the bug which originally manifested in a way which I
confused with a need for row locks, as well as another which was
nearby in the code.  We backed out the changes which were causing
merge problems for Robert, as those were part of the attempt at the
row locking (versus tuple locking).  We removed a function which is
no longer needed.  We adjusted the comments and an affected isolation
test.

As might be expected from removing an unnecessary feature, the lines
of code went down -- a net decrease of 93 lines.

This patch applies against HEAD, `make check-world` passes as does
`make -C src/test/isolation installcheck` and `make
installcheck-world` against a database with
default_transaction_isolation = 'serializable'.  Dan is running
stress testing against the patched version tonight with DBT-2.

These changes generate merge conflicts with the work I've done on
handling CLUSTER, DROP INDEX, etc.  It seems to me that the best
course would be to commit this, then I can rebase the other work and
post it.  Since these issues are orthogonal, it didn't seem like a
good idea to combine them in one patch, and this one seems more
urgent.

-Kevin

Attachment

ssi-tuple-not-row-locking-3.patch

Re: SSI predicate locking on heap -- tuple or row?

From

Heikki Linnakangas

Date:

26 May 2011, 05:07:06

On 26.05.2011 06:19, Kevin Grittner wrote:
> Dan and I went around a couple times chasing down all code, comment,
> and patch changes needed, resulting in the attached patch.  We found
> and fixed the bug which originally manifested in a way which I
> confused with a need for row locks, as well as another which was
> nearby in the code.  We backed out the changes which were causing
> merge problems for Robert, as those were part of the attempt at the
> row locking (versus tuple locking).  We removed a function which is
> no longer needed.  We adjusted the comments and an affected isolation
> test.

Could you explain in the README, why it is safe to only take the lock on 
the visible row version, please? It's not quite obvious, as we've seen 
from this discussion, and if I understood correctly the academic papers 
don't touch that subject either.

> As might be expected from removing an unnecessary feature, the lines
> of code went down -- a net decrease of 93 lines.

That's the kind of patch I like :-).

> These changes generate merge conflicts with the work I've done on
> handling CLUSTER, DROP INDEX, etc.  It seems to me that the best
> course would be to commit this, then I can rebase the other work and
> post it.  Since these issues are orthogonal, it didn't seem like a
> good idea to combine them in one patch, and this one seems more
> urgent.

Agreed.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: SSI predicate locking on heap -- tuple or row?

From

"Kevin Grittner"

Date:

26 May 2011, 16:11:47

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

> Could you explain in the README, why it is safe to only take the
> lock on the visible row version, please?

Sure.  I actually intended to do this last night but ran out of
steam and posted what I had, planning on following up with that.

The place it seemed to fit best was in the "Innovations" section,
since the SSI papers and their prototype implementations seemed
oriented toward "rows" -- certainly the SIREAD locks were at the row
level, versus a row version level.

Since this doesn't touch any of the files in yesterday's patch, and
it seems entirely within the realm of possibility that people will
want to argue about how best to document this more than the actual
fix, I'm posting it as a separate patch -- README-SSI only.

I mostly just copied from Dan's posted proof verbatim.

-Kevin

Attachment

ssi-readme-tuple-lock.patch

Re: SSI predicate locking on heap -- tuple or row?

From

"Kevin Grittner"

Date:

30 May 2011, 17:02:20

[The first attempt to send this shows as undeliverable to the list. 
Resending for completeness and coherency of archives.  Apologies to
those getting duplicates.]
> Heikki Linnakangas wrote:
> On 26.05.2011 06:19, Kevin Grittner wrote:
>
>> /*
>> * Check whether the writer has become a pivot with an
>> * out-conflict committed transaction, while neither reader
>> * nor writer is committed. If the reader is a READ ONLY
>> * transaction, there is only a serialization failure if an
>> * out-conflict transaction causing the pivot committed
>> * before the reader acquired its snapshot. (That is, the
>> * reader must not have been concurrent with the out-conflict
>> * transaction.)
>> */
>
> The comment is not in sync with the code. The code is not checking
> that "neither reader or writer has committed", but something more
> complicated.

True. We changed the code but not the comment. Sorry for missing
that. Basically the quoted clause would be correct by changing it to
"neither reader nor writer committed first." (Your rewrite,
discussed below, is better, though.)

> I find it helps tremendously to draw the dangerous structures being
> checked, in addition to just explaining them in text.

Agreed -- I spent many an hour with the "dangerous structure" diagram
in front of me when thinking through the mechanics of implementation.

> Ascii art is a bit clunky, but I think we have to do it here.

OK.

> I did some of that in the comments, and I think I understand it
> now. See attached patch. Does that look right to you?

! * A serialization failure can only occur if there is a dangerous
! * structure in the dependency graph:
! *
! * Tin ------> Tpivot ------> Tout
! * rw rw
! *
! * Furthermore, Tout must commit first.

I agree that this is easier to read and understand than just straight
text; but you dropped one other condition, which we use rather
heavily. We should probably add back something like:

* If Tin is declared READ ONLY (or commits without writing), we can
* only have a problem if Tout committed before Tin acquired its
* snapshot.

> * XXX: Where does that last condition come from?

This optimization is an original one, not yet appearing in any
academic papers; Dan and I are both convinced it is safe, and in
off-list correspondence with Michael Cahill after he left Oracle, he
said that he discussed this with Alan Fekete and they both concur
that it is a safe and good optimization.

This optimization is mentioned in the README-SSI file in the
"Innovations" section. Do you think that file needs to have more on
the topic?

> * XXX: for an anomaly to occur, T2 must've committed before W.
> * Couldn't we easily check that here, or does the fact that W
> * committed already somehow imply it?

The flags and macros should probably be renamed to make it more
obvious that this is covered. The flag which SxactHasConflictOut is
based on is only set when the conflict out is to a transaction which
committed ahead of the flagged transaction. Regarding the other
test -- since we have two transactions (Tin and Tout) which have not
been summarized, and transactions are summarized in order of commit,
SxactHasSummaryConflictOut implies that the transaction with the flag
set has not committed before the transaction to which it has a
rw-conflict out.

The problem is coming up with a more accurate name which isn't really
long. The best I can think of is SxactHasConflictOutToEarlierCommit,
which is a little long. If nobody has a better idea, I guess it's
better to have a long name than a misleading one. Do you think we
need to rename SxactHasSummaryConflictOut or just add a comment on
that use that a summarized transaction will always be committed ahead
of any transactions which are not summarized?

> (note that I swapped the second and third check in the function, I
> think it's more straightforward that way).

It doesn't matter to me either way, so if it seems clearer to you,
that's a win.

> Should we say "Cancelled on identification as pivot, during ...",
> or "Cancelled on identification as a pivot, during ..." ? We seem
> to use both in the error messages.

Consistency is good. I think it sounds better with the indefinite
article in there.

Do you want me to post another patch based on this, or are these
changes from what you posted small enough that you would prefer to
just do it as part of the commit?

Thanks for taking the time to work through this. As always, your
ideas improve things.

-Kevin

Re: SSI predicate locking on heap -- tuple or row?

From

Heikki Linnakangas

Date:

30 May 2011, 17:35:51

On 26.05.2011 06:19, Kevin Grittner wrote:
>     /*
>      * Check whether the writer has become a pivot with an out-conflict
>      * committed transaction, while neither reader nor writer is committed. If
>      * the reader is a READ ONLY transaction, there is only a serialization
>      * failure if an out-conflict transaction causing the pivot committed
>      * before the reader acquired its snapshot.  (That is, the reader must not
>      * have been concurrent with the out-conflict transaction.)
>      */
>     if (!failure)
>     {
>         if (SxactHasSummaryConflictOut(writer))
>         {
>             failure = true;
>             conflict = NULL;
>         }
>         else
>             conflict = (RWConflict)
>                 SHMQueueNext(&writer->outConflicts,
>                              &writer->outConflicts,
>                              offsetof(RWConflictData, outLink));
>         while (conflict)
>         {
>             if (SxactIsCommitted(conflict->sxactIn)
>                 && (!SxactIsCommitted(reader)
>                     || conflict->sxactIn->commitSeqNo <= reader->commitSeqNo)
>                 && (!SxactIsCommitted(writer)
>                     || conflict->sxactIn->commitSeqNo <= writer->commitSeqNo)
>                 && (!SxactIsReadOnly(reader)
>                     || conflict->sxactIn->commitSeqNo <= reader->SeqNo.lastCommitBeforeSnapshot))
>             {
>                 failure = true;
>                 break;
>             }
>             conflict = (RWConflict)
>                 SHMQueueNext(&writer->outConflicts,
>                              &conflict->outLink,
>                              offsetof(RWConflictData, outLink));
>         }
>     }

The comment is not in sync with the code. The code is not checking that
"neither reader or writer has committed", but something more complicated.

Looking at OnConflict_CheckForSerializationFailure(), it's really hard
to see how it works, and why the conditions it checks are sufficient. I
find it helps tremendously to draw the dangerous structures being
checked, in addition to just explaining them in text. Ascii art is a bit
clunky, but I think we have to do it here.

I did some of that in the comments, and I think I understand it now. See
attached patch. Does that look right to you? (note that I swapped the
second and third check in the function, I think it's more
straightforward that way). I also added a couple of questions about the
conditions, marked with XXX comments. Can you answer those, please?

PS. Should we say "Cancelled on identification as pivot, during ...", or
"Cancelled on identification as a pivot, during ..." ? We seem to use
both in the error messages.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

ssi-tuple-not-row-locking-4.patch

Re: SSI predicate locking on heap -- tuple or row?

From

Heikki Linnakangas

Date:

30 May 2011, 17:53:41

On 30.05.2011 17:10, Kevin Grittner wrote:
>> Heikki Linnakangas  wrote:
>> On 26.05.2011 06:19, Kevin Grittner wrote:
>> I did some of that in the comments, and I think I understand it
>> now. See attached patch. Does that look right to you?
>
> !  * A serialization failure can only occur if there is a dangerous
> !  * structure in the dependency graph:
> !  *
> !  *        Tin ------>  Tpivot ------>  Tout
> !  *                 rw             rw
> !  *
> !  * Furthermore, Tout must commit first.
>
> I agree that this is easier to read and understand than just straight
> text; but you dropped one other condition, which we use rather
> heavily.  We should probably add back something like:
>
> * If Tin is declared READ ONLY (or commits without writing), we can
> * only have a problem if Tout committed before Tin acquired its
> * snapshot.
>
>>      * XXX: Where does that last condition come from?
>
> This optimization is an original one, not yet appearing in any
> academic papers; Dan and I are both convinced it is safe, and in
> off-list correspondence with Michael Cahill after he left Oracle, he
> said that he discussed this with Alan Fekete and they both concur
> that it is a safe and good optimization.
>
> This optimization is mentioned in the README-SSI file in the
> "Innovations" section.  Do you think that file needs to have more on
> the topic?

Oh, I see this:

>           o We can avoid ever rolling back a transaction when the
> transaction on the conflict *in* side of the pivot is explicitly or
> implicitly READ ONLY unless the transaction on the conflict *out*
> side of the pivot committed before the READ ONLY transaction acquired
> its snapshot. (An implicit READ ONLY transaction is one which
> committed without writing, even though it was not explicitly declared
> to be READ ONLY.)

Since this isn't coming from the papers, it would be nice to explain why 
that is safe.

>>       * XXX: for an anomaly to occur, T2 must've committed before W.
>>       * Couldn't we easily check that here, or does the fact that W
>>       * committed already somehow imply it?
>
> The flags and macros should probably be renamed to make it more
> obvious that this is covered.  The flag which SxactHasConflictOut is
> based on is only set when the conflict out is to a transaction which
> committed ahead of the flagged transaction.  Regarding the other
> test -- since we have two transactions (Tin and Tout) which have not
> been summarized, and transactions are summarized in order of commit,
> SxactHasSummaryConflictOut implies that the transaction with the flag
> set has not committed before the transaction to which it has a
> rw-conflict out.

Ah, got it.

> The problem is coming up with a more accurate name which isn't really
> long.  The best I can think of is SxactHasConflictOutToEarlierCommit,
> which is a little long.  If nobody has a better idea, I guess it's
> better to have a long name than a misleading one.  Do you think we
> need to rename SxactHasSummaryConflictOut or just add a comment on
> that use that a summarized transaction will always be committed ahead
> of any transactions which are not summarized?

Dunno. I guess a comment would do. Can you write a separate patch for 
that, please?

>> (note that I swapped the second and third check in the function, I
>> think it's more straightforward that way).
>
> It doesn't matter to me either way, so if it seems clearer to you,
> that's a win.

FWIW, the reason I prefer it that way is that the function is checking 
three scenarios:

R ----> W ----> T2, where W committed after T2
R ----> W ----> T2, "else"
T0 ----> R ----> W

It seems more straightforward to keep both "R ----> W ----> T2" cases 
together.

>> Should we say "Cancelled on identification as pivot, during ...",
>> or "Cancelled on identification as a pivot, during ..." ? We seem
>> to use both in the error messages.
>
> Consistency is good.  I think it sounds better with the indefinite
> article in there.

Ok, will do.

> Do you want me to post another patch based on this, or are these
> changes from what you posted small enough that you would prefer to
> just do it as part of the commit?

I've committed with the discussed changes.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: SSI predicate locking on heap -- tuple or row?

From

"Kevin Grittner"

Date:

01 June 2011, 19:09:39

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
> On 30.05.2011 17:10, Kevin Grittner wrote:
>> This optimization is an original one, not yet appearing in any
>> academic papers; Dan and I are both convinced it is safe, and in
>> off-list correspondence with Michael Cahill after he left Oracle,
>> he said that he discussed this with Alan Fekete and they both
>> concur that it is a safe and good optimization.
>>
>> This optimization is mentioned in the README-SSI file in the
>> "Innovations" section.  Do you think that file needs to have more
on
>> the topic?
> 
> Oh, I see this:
> 
>>           o We can avoid ever rolling back a transaction when the
>> transaction on the conflict *in* side of the pivot is explicitly
or
>> implicitly READ ONLY unless the transaction on the conflict *out*
>> side of the pivot committed before the READ ONLY transaction
>> acquired its snapshot. (An implicit READ ONLY transaction is one
>> which committed without writing, even though it was not
>> explicitly declared to be READ ONLY.)
> 
> Since this isn't coming from the papers, it would be nice to
> explain why that is safe.
I see that this issue first surfaced on the Wiki page 2 April, 2010,
and was never really described in detail on the -hackers list.  To
ensure that it has some documentation here (in case of any possible
IP controversy), I will describe a proof now.  For earlier
references one could dig around in Wiki history, posted patches
during CFs, and the git repository history.  I have kept a copy of
the repo before the official conversion from CVS.
From many academic papers, there is well-established proof that
serialization anomalies can only occur under snapshot isolation when
there is a cycle in the graph of apparent order of execution of the
transactions, and that in such a cycle the following pattern of
rw-dependencies always occurs:   Tin - - -> Tpivot - - -> Tout
A rw-dependency (also called a rw-conflict) exists when a read by
one transaction doesn't see the write of another transaction because
the two transactions overlap, regardless of whether the read or the
write actually happens first.  Since the reader doesn't see the work
of the writer, the reader appears to have executed first, regardless
of the actual order of snapshot acquisition or commits.  The arrows
show the apparent order of execution of the transactions -- Tin
first, Tout last.  Published papers have further proven that the
transaction which appears to have executed last of these three must
actually commit before either of the others for an anomaly to occur.
Tin and Tout may be the same transaction (as in the case of simple
write skew), or two distinct transactions.
SSI relies on recognition of this "dangerous structure" to decide
when to cancel a transaction to preserve data integrity.  No attempt
is made to complete the cycle from Tout back to Tin.  Partly this is
because of the expense of doing so -- there could be a long chain of
rw-dependencies, wr-dependencies (where the reader *can* see the
work of another transaction because it *did* commit first), and
ww-dependencies (where the writer is updating a row version left by
another transaction, which must have committed first).  There is
also the uncomfortable possibility that a client application
*outside* the database ran a transaction which made some change and,
after observing the successful commit of this transaction, is
proceeding with the knowledge that it ran before the next transaction
is submitted.
The novel optimization we've used in the PostgreSQL implementation
of SSI is based on the fact that of these four ways of determining
which transaction appears to have executed first, all but the
rw-dependency are based on the transaction which appears to have
executed first committing before the apparent later transaction
acquires its snapshot.  When you combine this with the fact that the
transaction which appears to execute later in a rw-dependency is the
one which performed the write, you have the basis of an interesting
optimization for READ ONLY transactions.
In the above diagram, if Tin is READ ONLY, it cannot have a
rw-dependency *in* from some other transaction.  The only way Tout
can directly appear to have executed before Tin is if Tout committed
before Tin acquired its snapshot, so that Tin can read something
Tout wrote, or an external application can know that it successfully
committed Tout before beginning Tin.  The published conditions must
all still hold -- Tin and Tout must both overlap Tpivot, the same
rw-dependencies must exist, and Tout must still commit first of the
three; but when Tin doesn't write, Tout must actually commit *even
earlier* -- before Tin gets started -- to have an anomaly.
For a proof the question becomes: If Tin does not write and Tin and
Tout overlap, can a dependency or chain of dependencies develop
which makes it appear that Tout executed before Tin?  If this can't
happen without creating a dangerous structure around a different
pivot, then no cycle can develop and the optimization is safe.
Since Tin doesn't write, no transaction can appear to come before it
because of failure to read its writes -- no rw-dependency in to this
transaction is possible.  The only transactions Tin can appear to
follow are transactions which committed in time to make their work
visible to Tin.  Let's assume such a transaction exists, called T0:   T0 -----> Tin - - -> Tpivot - - -> Tout
It would be possible for Tout to overlap T0 and develop a
rw-dependency out to it, but T0 must commit before Tin starts, so
for Tout to overlap Tin, T0 must commit before Tout, so a
rw-dependency from Tout to T0 would make Tout a pivot and cause a
rollback which would break the cycle.  Any other transaction to
which Tout developed a rw-dependency out would have the same
problem.

Any *other* type of dependency out from Tout would require the
transaction to acquire its snapshot after the commit of Tout.  Since
T0 commits before Tin begins and Tout overlaps Tin, the commit of
Tout must come after the commit of T0, so no transaction which
begins after Tout commits can overlap T0.  This leaves no possible
way to form a cycle when Tin is READ ONLY and Tout overlaps Tin.
I won't be shocked if Dan can come up with a shorter proof, but I'm
confident this one is solid.
Patch to come soon, but I thought it best to post the proof here
first to allow a chance to refine it based on discussion -- or
shorter proofs.
-Kevin

Re: SSI predicate locking on heap -- tuple or row?

From

Dan Ports

Date:

02 June 2011, 04:14:29

On Wed, Jun 01, 2011 at 05:09:09PM -0500, Kevin Grittner wrote:
> I won't be shocked if Dan can come up with a shorter proof, but I'm
> confident this one is solid.

Well, so happens I wrote a proof on the airplane today, before I saw
your mail. It's actually quite straightforward... (well, at least more
so than I was expecting)

> From many academic papers, there is well-established proof that
> serialization anomalies can only occur under snapshot isolation when
> there is a cycle in the graph of apparent order of execution of the
> transactions, and that in such a cycle the following pattern of
> rw-dependencies always occurs:
>  
>     Tin - - -> Tpivot - - -> Tout
>  
> A rw-dependency (also called a rw-conflict) exists when a read by
> one transaction doesn't see the write of another transaction because
> the two transactions overlap, regardless of whether the read or the
> write actually happens first.  Since the reader doesn't see the work
> of the writer, the reader appears to have executed first, regardless
> of the actual order of snapshot acquisition or commits.  The arrows
> show the apparent order of execution of the transactions -- Tin
> first, Tout last.  Published papers have further proven that the
> transaction which appears to have executed last of these three must
> actually commit before either of the others for an anomaly to occur.

We can actually say something slightly stronger than that last
sentence: Tout has to commit before *any* other transaction in the
cycle. That doesn't help us implement SSI, because we never try to look
at an entire cycle, but it's still true and useful for proofs like this.

Now, supposing Tin is read-only...

Since there's a cycle, there must also be a transaction that precedes
Tin in the serial order. Call it T0. (T0 might be the same transaction
as Tout, but that doesn't matter.) There's an edge in the graph from
T0 to Tin. It can't be a rw-conflict, because Tin was read-only, so it
must be a ww- or wr-dependency. Either means T0 committed before Tin
started.

Because Tout committed before any other transaction in the cycle, Tout
has to commit before T0 commits -- and thus before Tin starts.

Dan

-- 
Dan R. K. Ports              MIT CSAIL                http://drkp.net/

Re: SSI predicate locking on heap -- tuple or row?

From

"Kevin Grittner"

Date:

02 June 2011, 15:23:11

Dan Ports <drkp@csail.mit.edu> wrote:
> On Wed, Jun 01, 2011 at 05:09:09PM -0500, Kevin Grittner wrote:
>> Published papers have further proven that the transaction which
>> appears to have executed last of these three must actually commit
>> before either of the others for an anomaly to occur.
> 
> We can actually say something slightly stronger than that last
> sentence: Tout has to commit before *any* other transaction in the
> cycle. That doesn't help us implement SSI, because we never try to
> look at an entire cycle, but it's still true and useful for proofs
> like this.
I didn't know that, although it doesn't seem too surprising.  With
that as a given, the proof can be quite short and straightforward.
> Now, supposing Tin is read-only...
> 
> Since there's a cycle, there must also be a transaction that
> precedes Tin in the serial order. Call it T0. (T0 might be the
> same transaction as Tout, but that doesn't matter.) There's an
> edge in the graph from T0 to Tin. It can't be a rw-conflict,
> because Tin was read-only, so it must be a ww- or wr-dependency.
> Either means T0 committed before Tin started.
> 
> Because Tout committed before any other transaction in the cycle,
> Tout has to commit before T0 commits -- and thus before Tin
> starts.
If we're going to put this into the README-SSI as the proof of the
validity of this optimization, I'd like to have a footnote pointing
to a paper describing the "first commit in the cycle" aspect of a
dangerous structure.  Got any favorites, or should I fall back on a
google search?
-Kevin

Re: SSI predicate locking on heap -- tuple or row?

From

"Kevin Grittner"

Date:

02 June 2011, 18:14:57

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
> On 30.05.2011 17:10, Kevin Grittner wrote:
>> Heikki Linnakangas  wrote:

>>>       * XXX: for an anomaly to occur, T2 must've committed
>>>       * before W. Couldn't we easily check that here, or does
>>>       * the fact that W committed already somehow imply it?
>>
>> The flags and macros should probably be renamed to make it more
>> obvious that this is covered.  The flag which SxactHasConflictOut
>> is based on is only set when the conflict out is to a transaction
>> which committed ahead of the flagged transaction.  Regarding the
>> other test -- since we have two transactions (Tin and Tout) which
>> have not been summarized, and transactions are summarized in
>> order of commit, SxactHasSummaryConflictOut implies that the
>> transaction with the flag set has not committed before the
>> transaction to which it has a rw-conflict out.
>
> Ah, got it.
>
>> The problem is coming up with a more accurate name which isn't
>> really long.  The best I can think of is
>> SxactHasConflictOutToEarlierCommit, which is a little long.  If
>> nobody has a better idea, I guess it's better to have a long name
>> than a misleading one.  Do you think we need to rename
>> SxactHasSummaryConflictOut or just add a comment on that use that
>> a summarized transaction will always be committed ahead of any
>> transactions which are not summarized?
>
> Dunno. I guess a comment would do. Can you write a separate patch
> for that, please?

Attached is a comments-only patch for this, along with one
correction to  the comments you added and a couple other typos.

I'll submit a patch for the README-SSI file once I find a reference
I like to a paper describing what Dan's proof uses as a premise --
that the transaction on the rw-conflict *out* side of the pivot must
not only be the first of the three transactions in the dangerous
structure to commit, but the first in the entire cycle of which the
dangerous structure is a part.  With that premise as a given, the
proof is short, clear, and unassailable; but I think we need a cite
to use that argument convincingly.

-Kevin

Attachment

ssi-comment-fixups-1.patch

Re: SSI predicate locking on heap -- tuple or row?

From

Dan Ports

Date:

02 June 2011, 18:37:55

On Thu, Jun 02, 2011 at 01:01:05PM -0500, Kevin Grittner wrote:
> If we're going to put this into the README-SSI as the proof of the
> validity of this optimization, I'd like to have a footnote pointing
> to a paper describing the "first commit in the cycle" aspect of a
> dangerous structure.  Got any favorites, or should I fall back on a
> google search?

Hmm. I don't see any that state that in so many words, but it's an
obvious consequence of the proof of Theorem 2.1 in "Making Snapshot
Isolation Serializable" -- note that T3 is chosen to be the transaction
in the cycle with the earliest commit time.

Dan

-- 
Dan R. K. Ports              MIT CSAIL                http://drkp.net/

Re: SSI predicate locking on heap -- tuple or row?

From

Heikki Linnakangas

Date:

03 June 2011, 07:04:41

On 03.06.2011 00:14, Kevin Grittner wrote:
> Attached is a comments-only patch for this, along with one
> correction to  the comments you added and a couple other typos.

Ok, committed.

> I'll submit a patch for the README-SSI file once I find a reference
> I like to a paper describing what Dan's proof uses as a premise --
> that the transaction on the rw-conflict *out* side of the pivot must
> not only be the first of the three transactions in the dangerous
> structure to commit, but the first in the entire cycle of which the
> dangerous structure is a part.  With that premise as a given, the
> proof is short, clear, and unassailable; but I think we need a cite
> to use that argument convincingly.

Agreed.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com