SSI predicate locking on heap -- tuple or row? - Mailing list pgsql-hackers

From Kevin Grittner
Subject SSI predicate locking on heap -- tuple or row?
Date
Msg-id 4DD7D598020000250003DAA3@gw.wicourts.gov
Whole thread Raw
Responses Re: SSI predicate locking on heap -- tuple or row?  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Re: SSI predicate locking on heap -- tuple or row?  (Robert Haas <robertmhaas@gmail.com>)
Re: SSI predicate locking on heap -- tuple or row?  (Dan Ports <drkp@csail.mit.edu>)
Re: SSI predicate locking on heap -- tuple or row?  (Dan Ports <drkp@csail.mit.edu>)
List pgsql-hackers
As background, for most of SSI development I had a TODO comment in the
source which was basically around the question of whether a predicate
lock on a visible heap tuple should propagate to later versions of the
row as long as the original lock was material.  In early February Heikki
(quite reasonably) insisted that I resolve that and either add code to
do so or a comment about why it wasn't necessary.  After looking at it
for many hours without getting to a logical proof, I thought I had a
proof by example: 
http://archives.postgresql.org/pgsql-hackers/2011-02/msg00325.php 
We added code for this, but it has been problematic; probably over half
the SSI bugs which have been found since SSI went into the code tree
have been related to this, and Robert Haas has just found a couple more.In discussing how to address this with Jeff
Davis(over beers at
 
PGCon), he provided good advice about how to properly fix these issues,
but posed some very good questions about whether it was really
necessary.  Rather than fix this aspect of the code, we might be able to
eliminate it, which would reduce the code size and some overhead.  Since
this code is more fragile than the rest of SSI, this is particularly
appealing -- my favorite way to deal with fragile code is to remove it.

I went back to the example which persuaded me and took another look.  On
review I see that this didn't prove the point because there was a
dangerous structure with T1 as a pivot which should have caused SSI to
break the cycle.  Since it didn't, either I got careless in my testing
methodology (which I will recheck when I get back to Wisconsin) or there
was a bug -- but either way, this example can't prove that the predicate
locks need to follow the row to new versions.  
I haven't been able to come up with an example where it actually *is*
needed, but failure to come up with an example where it is needed
doesn't prove that it isn't needed.  Here's my attempt at a logical
proof of whether it is needed.  It would be great if anyone with a grasp
of SSI concepts could confirm its validity or shoot it down.  (Hopefully
Friday's presentation expanded the number of those who can do so.) 
The issue can be rephrased to this: If transaction T1 reads a row (thus
acquiring a predicate lock on it) and a second transaction T2 updates
that row, must a third transaction T3 which updates the new version of
the row have a rw-conflict in from T1?
Now, the first thing which jumps out is the timing requirements -- T3
must overlap T1 or there is no conflict (and therefore no need to
replicate the predicate locks), but T3 must *not* overlap T2 or there
would be ww-conflict and one of them would roll back.  If T2 committed
before T1 acquired its snapshot, T1 would have gotten its predicate lock
on the second version of the row, so for a situation where this could
possibly matter, the following actual timings must exist (read "starts"
as meaning "acquires a snapshot"):
T1 and T2 start in either order.
T1 reads the row and T2 updates the row in either order.
T2 commits.
T3 starts.
T3 updates the row.
So far, using broken lines for rw-conflicts and solid for
wr-dependencies, we have this for apparent order of execution:
T1 - - -> T2 -----> T3
Not on the slides for the presentation, but briefly mentioned, is that
Alan Fekete and others have proven that serialization anomalies can only
occur if the transaction which appears on the rw-conflict *out* side of
the pivot (the transaction which appears to have executed third)
actually commits first.  T2 committed before T1, so there can't be a
cycle involving a rw-conflict *in* to T1 because that would complete the
dangerous structure -- unless it commits before T2 or is read only and
gets its snapshot before the commit of T2 (another special case which we
left out of the presentation in the interest of time).
Since T3 must get its snapshot after the commit of T2, if it develops a
rw-conflict with T1 there will be an SSI serialization failure without
needing the lock propagation.  So now the question breaks down to
whether we can have other transactions in the mix which, given the
above, cause T3 to appear to have executed before T1.
Since T3 cannot have committed before T1 (given what we know about the
actual timings required), that has to involve a rw-conflict somewhere in
the graph.  Since a wr-dependency is caused by the writer actually
committing before its work is read by another transaction, causing
apparent order of execution at that point to match actual serial order
of execution, such transactions would be benign in the transaction graph
-- they might exist in a problem graph but would not help to cause a
later-starting transaction to appear to have executed first.  We can
look just a rw-conflicts to cause the apparent order of execution to
loop back in time.
Since the time-arrow for rw-conflict points in the direction of the
write, we can ignore read only transactions.  So to complete the cycle
back to T1, the transaction T0 with the rw-conflict to T1 must have
committed before T2 committed.  This means that it can't overlap with
T3, because it started after the commit of T2.
So the question is now whether T3 can have a rw-conflict (or chain of
conflicts) to T0.The timings required to make it necessary to replicate predicate locks
to later versions of the row now are:
T0, T1, and T2 start in any order.
T1 reads the row and T2 updates the row in either order; also T1 updates
some other data which conflicts with a T0 read, in any order.
T0 commits.
T2 commits.
T3 starts.
T3 updates the row.

Apparent order of execution is now:
T0 - - -> T1 - - -> T2 -----> T3
We need at least one more transaction to complete the cycle here.  For
only one to do it, it must overlap both T0 and T3, it must do some read
which conflicts with a write of T0 and some write which conflicts with a
read of T3.

Listing the timings at this point would be pretty chaotic -- T4 starts
and develops a rw-conflict with T0 anytime before T0 commits, and
develops a rw-conflict with T3 at any point after T3 starts.
Apparent order of execution is now: 
T0 - - -> T1 - - -> T2 -----> T3 - - -> T4 - - -> T0
But wait -- T4 is now a pivot with T0 committing first and T3 is not
read only.  That part of the cycle is a dangerous structure and one of
those transactions will be rolled back.
So far we have now proven that you can't cause a problem with only five
transactions.
The question is now whether T4 can be replaced with multiple
transactions forming a rw-conflict without one of them committing at a
time which would create a dangerous structure and break the cycle.
[Anyone who has followed along this far has earned my undying
gratitude.]
Since the chain of transactions has to overlap T0 and T3, I don't see
how that can happen.  We established that T0 has to commit before T3 can
start, so the chain will ultimately have to get to that T0 commit.
I don't want to start ripping out the apparently useless code without
someone checking my logic here.  One big gaff in this area is quite
enough for me.  :-/  Anyone?
-Kevin



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: proposal: enhanced get diagnostics statement
Next
From: Pavan Deolasee
Date:
Subject: Re: SSI predicate locking on heap -- tuple or row?