Re: SSI predicate locking on heap -- tuple or row? - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: SSI predicate locking on heap -- tuple or row? |
Date | |
Msg-id | 4DE67235020000250003DFBC@gw.wicourts.gov Whole thread Raw |
In response to | Re: SSI predicate locking on heap -- tuple or row? (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Responses |
Re: SSI predicate locking on heap -- tuple or row?
|
List | pgsql-hackers |
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 30.05.2011 17:10, Kevin Grittner wrote: >> This optimization is an original one, not yet appearing in any >> academic papers; Dan and I are both convinced it is safe, and in >> off-list correspondence with Michael Cahill after he left Oracle, >> he said that he discussed this with Alan Fekete and they both >> concur that it is a safe and good optimization. >> >> This optimization is mentioned in the README-SSI file in the >> "Innovations" section. Do you think that file needs to have more on >> the topic? > > Oh, I see this: > >> o We can avoid ever rolling back a transaction when the >> transaction on the conflict *in* side of the pivot is explicitly or >> implicitly READ ONLY unless the transaction on the conflict *out* >> side of the pivot committed before the READ ONLY transaction >> acquired its snapshot. (An implicit READ ONLY transaction is one >> which committed without writing, even though it was not >> explicitly declared to be READ ONLY.) > > Since this isn't coming from the papers, it would be nice to > explain why that is safe. I see that this issue first surfaced on the Wiki page 2 April, 2010, and was never really described in detail on the -hackers list. To ensure that it has some documentation here (in case of any possible IP controversy), I will describe a proof now. For earlier references one could dig around in Wiki history, posted patches during CFs, and the git repository history. I have kept a copy of the repo before the official conversion from CVS. From many academic papers, there is well-established proof that serialization anomalies can only occur under snapshot isolation when there is a cycle in the graph of apparent order of execution of the transactions, and that in such a cycle the following pattern of rw-dependencies always occurs: Tin - - -> Tpivot - - -> Tout A rw-dependency (also called a rw-conflict) exists when a read by one transaction doesn't see the write of another transaction because the two transactions overlap, regardless of whether the read or the write actually happens first. Since the reader doesn't see the work of the writer, the reader appears to have executed first, regardless of the actual order of snapshot acquisition or commits. The arrows show the apparent order of execution of the transactions -- Tin first, Tout last. Published papers have further proven that the transaction which appears to have executed last of these three must actually commit before either of the others for an anomaly to occur. Tin and Tout may be the same transaction (as in the case of simple write skew), or two distinct transactions. SSI relies on recognition of this "dangerous structure" to decide when to cancel a transaction to preserve data integrity. No attempt is made to complete the cycle from Tout back to Tin. Partly this is because of the expense of doing so -- there could be a long chain of rw-dependencies, wr-dependencies (where the reader *can* see the work of another transaction because it *did* commit first), and ww-dependencies (where the writer is updating a row version left by another transaction, which must have committed first). There is also the uncomfortable possibility that a client application *outside* the database ran a transaction which made some change and, after observing the successful commit of this transaction, is proceeding with the knowledge that it ran before the next transaction is submitted. The novel optimization we've used in the PostgreSQL implementation of SSI is based on the fact that of these four ways of determining which transaction appears to have executed first, all but the rw-dependency are based on the transaction which appears to have executed first committing before the apparent later transaction acquires its snapshot. When you combine this with the fact that the transaction which appears to execute later in a rw-dependency is the one which performed the write, you have the basis of an interesting optimization for READ ONLY transactions. In the above diagram, if Tin is READ ONLY, it cannot have a rw-dependency *in* from some other transaction. The only way Tout can directly appear to have executed before Tin is if Tout committed before Tin acquired its snapshot, so that Tin can read something Tout wrote, or an external application can know that it successfully committed Tout before beginning Tin. The published conditions must all still hold -- Tin and Tout must both overlap Tpivot, the same rw-dependencies must exist, and Tout must still commit first of the three; but when Tin doesn't write, Tout must actually commit *even earlier* -- before Tin gets started -- to have an anomaly. For a proof the question becomes: If Tin does not write and Tin and Tout overlap, can a dependency or chain of dependencies develop which makes it appear that Tout executed before Tin? If this can't happen without creating a dangerous structure around a different pivot, then no cycle can develop and the optimization is safe. Since Tin doesn't write, no transaction can appear to come before it because of failure to read its writes -- no rw-dependency in to this transaction is possible. The only transactions Tin can appear to follow are transactions which committed in time to make their work visible to Tin. Let's assume such a transaction exists, called T0: T0 -----> Tin - - -> Tpivot - - -> Tout It would be possible for Tout to overlap T0 and develop a rw-dependency out to it, but T0 must commit before Tin starts, so for Tout to overlap Tin, T0 must commit before Tout, so a rw-dependency from Tout to T0 would make Tout a pivot and cause a rollback which would break the cycle. Any other transaction to which Tout developed a rw-dependency out would have the same problem. Any *other* type of dependency out from Tout would require the transaction to acquire its snapshot after the commit of Tout. Since T0 commits before Tin begins and Tout overlaps Tin, the commit of Tout must come after the commit of T0, so no transaction which begins after Tout commits can overlap T0. This leaves no possible way to form a cycle when Tin is READ ONLY and Tout overlaps Tin. I won't be shocked if Dan can come up with a shorter proof, but I'm confident this one is solid. Patch to come soon, but I thought it best to post the proof here first to allow a chance to refine it based on discussion -- or shorter proofs. -Kevin
pgsql-hackers by date: