Re: SSI predicate locking on heap -- tuple or row? - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: SSI predicate locking on heap -- tuple or row?
Date
Msg-id 4DE67235020000250003DFBC@gw.wicourts.gov
Whole thread Raw
In response to Re: SSI predicate locking on heap -- tuple or row?  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: SSI predicate locking on heap -- tuple or row?
List pgsql-hackers
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
> On 30.05.2011 17:10, Kevin Grittner wrote:
>> This optimization is an original one, not yet appearing in any
>> academic papers; Dan and I are both convinced it is safe, and in
>> off-list correspondence with Michael Cahill after he left Oracle,
>> he said that he discussed this with Alan Fekete and they both
>> concur that it is a safe and good optimization.
>>
>> This optimization is mentioned in the README-SSI file in the
>> "Innovations" section.  Do you think that file needs to have more
on
>> the topic?
> 
> Oh, I see this:
> 
>>           o We can avoid ever rolling back a transaction when the
>> transaction on the conflict *in* side of the pivot is explicitly
or
>> implicitly READ ONLY unless the transaction on the conflict *out*
>> side of the pivot committed before the READ ONLY transaction
>> acquired its snapshot. (An implicit READ ONLY transaction is one
>> which committed without writing, even though it was not
>> explicitly declared to be READ ONLY.)
> 
> Since this isn't coming from the papers, it would be nice to
> explain why that is safe.
I see that this issue first surfaced on the Wiki page 2 April, 2010,
and was never really described in detail on the -hackers list.  To
ensure that it has some documentation here (in case of any possible
IP controversy), I will describe a proof now.  For earlier
references one could dig around in Wiki history, posted patches
during CFs, and the git repository history.  I have kept a copy of
the repo before the official conversion from CVS.
From many academic papers, there is well-established proof that
serialization anomalies can only occur under snapshot isolation when
there is a cycle in the graph of apparent order of execution of the
transactions, and that in such a cycle the following pattern of
rw-dependencies always occurs:   Tin - - -> Tpivot - - -> Tout
A rw-dependency (also called a rw-conflict) exists when a read by
one transaction doesn't see the write of another transaction because
the two transactions overlap, regardless of whether the read or the
write actually happens first.  Since the reader doesn't see the work
of the writer, the reader appears to have executed first, regardless
of the actual order of snapshot acquisition or commits.  The arrows
show the apparent order of execution of the transactions -- Tin
first, Tout last.  Published papers have further proven that the
transaction which appears to have executed last of these three must
actually commit before either of the others for an anomaly to occur.
Tin and Tout may be the same transaction (as in the case of simple
write skew), or two distinct transactions.
SSI relies on recognition of this "dangerous structure" to decide
when to cancel a transaction to preserve data integrity.  No attempt
is made to complete the cycle from Tout back to Tin.  Partly this is
because of the expense of doing so -- there could be a long chain of
rw-dependencies, wr-dependencies (where the reader *can* see the
work of another transaction because it *did* commit first), and
ww-dependencies (where the writer is updating a row version left by
another transaction, which must have committed first).  There is
also the uncomfortable possibility that a client application
*outside* the database ran a transaction which made some change and,
after observing the successful commit of this transaction, is
proceeding with the knowledge that it ran before the next transaction
is submitted.
The novel optimization we've used in the PostgreSQL implementation
of SSI is based on the fact that of these four ways of determining
which transaction appears to have executed first, all but the
rw-dependency are based on the transaction which appears to have
executed first committing before the apparent later transaction
acquires its snapshot.  When you combine this with the fact that the
transaction which appears to execute later in a rw-dependency is the
one which performed the write, you have the basis of an interesting
optimization for READ ONLY transactions.
In the above diagram, if Tin is READ ONLY, it cannot have a
rw-dependency *in* from some other transaction.  The only way Tout
can directly appear to have executed before Tin is if Tout committed
before Tin acquired its snapshot, so that Tin can read something
Tout wrote, or an external application can know that it successfully
committed Tout before beginning Tin.  The published conditions must
all still hold -- Tin and Tout must both overlap Tpivot, the same
rw-dependencies must exist, and Tout must still commit first of the
three; but when Tin doesn't write, Tout must actually commit *even
earlier* -- before Tin gets started -- to have an anomaly.
For a proof the question becomes: If Tin does not write and Tin and
Tout overlap, can a dependency or chain of dependencies develop
which makes it appear that Tout executed before Tin?  If this can't
happen without creating a dangerous structure around a different
pivot, then no cycle can develop and the optimization is safe.
Since Tin doesn't write, no transaction can appear to come before it
because of failure to read its writes -- no rw-dependency in to this
transaction is possible.  The only transactions Tin can appear to
follow are transactions which committed in time to make their work
visible to Tin.  Let's assume such a transaction exists, called T0:   T0 -----> Tin - - -> Tpivot - - -> Tout
It would be possible for Tout to overlap T0 and develop a
rw-dependency out to it, but T0 must commit before Tin starts, so
for Tout to overlap Tin, T0 must commit before Tout, so a
rw-dependency from Tout to T0 would make Tout a pivot and cause a
rollback which would break the cycle.  Any other transaction to
which Tout developed a rw-dependency out would have the same
problem.

Any *other* type of dependency out from Tout would require the
transaction to acquire its snapshot after the commit of Tout.  Since
T0 commits before Tin begins and Tout overlaps Tin, the commit of
Tout must come after the commit of T0, so no transaction which
begins after Tout commits can overlap T0.  This leaves no possible
way to form a cycle when Tin is READ ONLY and Tout overlaps Tin.
I won't be shocked if Dan can come up with a shorter proof, but I'm
confident this one is solid.
Patch to come soon, but I thought it best to post the proof here
first to allow a chance to refine it based on discussion -- or
shorter proofs.
-Kevin


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pgpool versus sequences
Next
From: Robert Haas
Date:
Subject: Re: Bad UI design: pg_ctl and data_directory