Re: SSI patch version 14 - Mailing list pgsql-hackers

From Dan Ports
Subject Re: SSI patch version 14
Date
Msg-id 20110129043640.GN57071@csail.mit.edu
Whole thread Raw
In response to Re: SSI patch version 14  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Thu, Jan 27, 2011 at 09:18:23AM -0800, Jeff Davis wrote:
> On Tue, 2011-01-25 at 05:57 -0500, Dan Ports wrote:
> > This summary is right on. I would add one additional detail or
> > clarification to the last point, which is that rather than checking for
> > a cycle, we're checking for a transaction with both "in" and "out"
> > conflicts, which every cycle must contain.
> 
> To clarify, this means that it will get some false positives, right?

Yes, this is correct.

> Is there a reason we can't check for a real cycle, which would let T1
> succeed?

I talked today with someone who experimented with doing exactly that in
MySQL as part of a research project (Perfectly Serializable Snapshot
Isolation, paper forthcoming in ICDE)

It confirmed my intuition that this is possible but not as
straightforward as it sounds. Some complications I thought of in
adapting that to what we're doing:

1) it requires tracking all edges in the serialization graph; besides  the rw-conflicts we track there are also the
moremundane  rw-dependencies (if T2 sees an update made by T1, then T2 has to  come after T1) and ww-dependencies (if
T2'swrite modifies a tuple  created by T1, then T2 has to come after T1).    We are taking advantage of the fact that
anycycle must have two  adjacent rw-conflict edges, but the rest of the cycle can be  composed of other types. It would
certainlybe possible to track the  others, but it would add a bit more complexity and use more memory.
 

2) it requires doing a depth-first search to check for a cycle, which  is more expensive than just detecting two edges.
Thatseems OK if  you only want to check it on commit, but maybe not if you want to  detect serialization failures at
thetime of the conflicting  read/write (as we do).
 

3) it doesn't seem to interact very well with our memory mitigation  efforts, wherein we discard lots of data about
committed transactions that we know we won't need. If we were checking for  cycles, we'd need to keep more of it. I
suspectsummarization (when  we combine two transactions' predicate locks to save memory) would  also cause problems.
 

None of these seem particularly insurmountable, but they suggest it
isn't a clear win to try to find an entire cycle.

Dan

-- 
Dan R. K. Ports              MIT CSAIL                http://drkp.net/


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: pg_upgrade fails for non-postgres user
Next
From: Fujii Masao
Date:
Subject: Re: Include WAL in base backup