SSI predicate locking on heap -- tuple or row? - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | SSI predicate locking on heap -- tuple or row? |
Date | |
Msg-id | 4DD7D598020000250003DAA3@gw.wicourts.gov Whole thread Raw |
Responses |
Re: SSI predicate locking on heap -- tuple or row?
(Pavan Deolasee <pavan.deolasee@gmail.com>)
Re: SSI predicate locking on heap -- tuple or row? (Robert Haas <robertmhaas@gmail.com>) Re: SSI predicate locking on heap -- tuple or row? (Dan Ports <drkp@csail.mit.edu>) Re: SSI predicate locking on heap -- tuple or row? (Dan Ports <drkp@csail.mit.edu>) |
List | pgsql-hackers |
As background, for most of SSI development I had a TODO comment in the source which was basically around the question of whether a predicate lock on a visible heap tuple should propagate to later versions of the row as long as the original lock was material. In early February Heikki (quite reasonably) insisted that I resolve that and either add code to do so or a comment about why it wasn't necessary. After looking at it for many hours without getting to a logical proof, I thought I had a proof by example: http://archives.postgresql.org/pgsql-hackers/2011-02/msg00325.php We added code for this, but it has been problematic; probably over half the SSI bugs which have been found since SSI went into the code tree have been related to this, and Robert Haas has just found a couple more.In discussing how to address this with Jeff Davis(over beers at PGCon), he provided good advice about how to properly fix these issues, but posed some very good questions about whether it was really necessary. Rather than fix this aspect of the code, we might be able to eliminate it, which would reduce the code size and some overhead. Since this code is more fragile than the rest of SSI, this is particularly appealing -- my favorite way to deal with fragile code is to remove it. I went back to the example which persuaded me and took another look. On review I see that this didn't prove the point because there was a dangerous structure with T1 as a pivot which should have caused SSI to break the cycle. Since it didn't, either I got careless in my testing methodology (which I will recheck when I get back to Wisconsin) or there was a bug -- but either way, this example can't prove that the predicate locks need to follow the row to new versions. I haven't been able to come up with an example where it actually *is* needed, but failure to come up with an example where it is needed doesn't prove that it isn't needed. Here's my attempt at a logical proof of whether it is needed. It would be great if anyone with a grasp of SSI concepts could confirm its validity or shoot it down. (Hopefully Friday's presentation expanded the number of those who can do so.) The issue can be rephrased to this: If transaction T1 reads a row (thus acquiring a predicate lock on it) and a second transaction T2 updates that row, must a third transaction T3 which updates the new version of the row have a rw-conflict in from T1? Now, the first thing which jumps out is the timing requirements -- T3 must overlap T1 or there is no conflict (and therefore no need to replicate the predicate locks), but T3 must *not* overlap T2 or there would be ww-conflict and one of them would roll back. If T2 committed before T1 acquired its snapshot, T1 would have gotten its predicate lock on the second version of the row, so for a situation where this could possibly matter, the following actual timings must exist (read "starts" as meaning "acquires a snapshot"): T1 and T2 start in either order. T1 reads the row and T2 updates the row in either order. T2 commits. T3 starts. T3 updates the row. So far, using broken lines for rw-conflicts and solid for wr-dependencies, we have this for apparent order of execution: T1 - - -> T2 -----> T3 Not on the slides for the presentation, but briefly mentioned, is that Alan Fekete and others have proven that serialization anomalies can only occur if the transaction which appears on the rw-conflict *out* side of the pivot (the transaction which appears to have executed third) actually commits first. T2 committed before T1, so there can't be a cycle involving a rw-conflict *in* to T1 because that would complete the dangerous structure -- unless it commits before T2 or is read only and gets its snapshot before the commit of T2 (another special case which we left out of the presentation in the interest of time). Since T3 must get its snapshot after the commit of T2, if it develops a rw-conflict with T1 there will be an SSI serialization failure without needing the lock propagation. So now the question breaks down to whether we can have other transactions in the mix which, given the above, cause T3 to appear to have executed before T1. Since T3 cannot have committed before T1 (given what we know about the actual timings required), that has to involve a rw-conflict somewhere in the graph. Since a wr-dependency is caused by the writer actually committing before its work is read by another transaction, causing apparent order of execution at that point to match actual serial order of execution, such transactions would be benign in the transaction graph -- they might exist in a problem graph but would not help to cause a later-starting transaction to appear to have executed first. We can look just a rw-conflicts to cause the apparent order of execution to loop back in time. Since the time-arrow for rw-conflict points in the direction of the write, we can ignore read only transactions. So to complete the cycle back to T1, the transaction T0 with the rw-conflict to T1 must have committed before T2 committed. This means that it can't overlap with T3, because it started after the commit of T2. So the question is now whether T3 can have a rw-conflict (or chain of conflicts) to T0.The timings required to make it necessary to replicate predicate locks to later versions of the row now are: T0, T1, and T2 start in any order. T1 reads the row and T2 updates the row in either order; also T1 updates some other data which conflicts with a T0 read, in any order. T0 commits. T2 commits. T3 starts. T3 updates the row. Apparent order of execution is now: T0 - - -> T1 - - -> T2 -----> T3 We need at least one more transaction to complete the cycle here. For only one to do it, it must overlap both T0 and T3, it must do some read which conflicts with a write of T0 and some write which conflicts with a read of T3. Listing the timings at this point would be pretty chaotic -- T4 starts and develops a rw-conflict with T0 anytime before T0 commits, and develops a rw-conflict with T3 at any point after T3 starts. Apparent order of execution is now: T0 - - -> T1 - - -> T2 -----> T3 - - -> T4 - - -> T0 But wait -- T4 is now a pivot with T0 committing first and T3 is not read only. That part of the cycle is a dangerous structure and one of those transactions will be rolled back. So far we have now proven that you can't cause a problem with only five transactions. The question is now whether T4 can be replaced with multiple transactions forming a rw-conflict without one of them committing at a time which would create a dangerous structure and break the cycle. [Anyone who has followed along this far has earned my undying gratitude.] Since the chain of transactions has to overlap T0 and T3, I don't see how that can happen. We established that T0 has to commit before T3 can start, so the chain will ultimately have to get to that T0 commit. I don't want to start ripping out the apparently useless code without someone checking my logic here. One big gaff in this area is quite enough for me. :-/ Anyone? -Kevin
pgsql-hackers by date: