Thread: WIP patch for latestCompletedXid method of computing snapshot xmax
This patch implements Florian's idea about how to manage snapshot xmax without the ugly and performance-losing tactic of taking XidGenLock and ProcArrayLock at the same time. I had to do a couple of slightly klugy things to get bootstrap and prepared transactions to work, but on the whole it seems at least as clean as the code we have now. Comments? regards, tom lane
Attachment
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > This patch implements Florian's idea about how to manage snapshot xmax > without the ugly and performance-losing tactic of taking XidGenLock and > ProcArrayLock at the same time. I had to do a couple of slightly klugy > things to get bootstrap and prepared transactions to work, but on the > whole it seems at least as clean as the code we have now. Comments? Just that it will be fascinating to see what effects this has on the benchmarks. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> This patch implements Florian's idea about how to manage snapshot xmax >> without the ugly and performance-losing tactic of taking XidGenLock and >> ProcArrayLock at the same time. I had to do a couple of slightly klugy >> things to get bootstrap and prepared transactions to work, but on the >> whole it seems at least as clean as the code we have now. Comments? > Just that it will be fascinating to see what effects this has on the > benchmarks. Yeah, I was hoping to get some benchmarks before deciding whether it's worth the risk of pushing this into 8.3. I'm off trying pgbench now, but if anyone wants to try something more serious like DBT2 ... regards, tom lane
Re: WIP patch for latestCompletedXid method of computing snapshot xmax
From
"Florian G. Pflug"
Date:
Tom Lane wrote: > This patch implements Florian's idea about how to manage snapshot xmax > without the ugly and performance-losing tactic of taking XidGenLock and > ProcArrayLock at the same time. I had to do a couple of slightly klugy > things to get bootstrap and prepared transactions to work, but on the > whole it seems at least as clean as the code we have now. Comments? I'm a bit late, but still a few comments 1) I wonder why you made XidCacheRemoveRunningXids take the latestXid as an extra argument, though - it should be able to figure that out on it's own I'd have thought. Is that just for consistency, or did I miss something? 2) I don't see how either the childXids array or the XID cache in the proc array could ever not be in ascending xid order. We do guarantee that a child xid is always greater than it's parent. This guarantee enables a few optimizations. First, as you say in the comments, finding the largest xid when committing becomes trivial. But more important, if we can assume that the proc array xid cache is always sorted, we can get ride of the exclusive lock during subxact abort. We will *always* remove the last n xids from the cache, so we just need to reset subxids.nxids, and not touch the actual array at all. (We might later set the now-unused entries to InvalidTransactionId, but thats only cosmetic). Regarding 2) Removing subxids from the proc array cannnot effect xmin calculations, because the toplevel xid is always small than all it's children. So we only need to care about snapshots themselves. But they won't care much if they find an aborted sub xid in the proc array or now. If not, the xid is neither running nor committed, so it's assumes to have crashed. If it's in the snapshot, it's treated as in-progress. So I believe the code could be simplified a bit if we just made it a rule that both childXids and the cache in the proc array are always in ascending xid order, and we could get rid of the exclusive lock during subxact abort. greetings, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes: > 1) I wonder why you made XidCacheRemoveRunningXids take the latestXid as > an extra argument, though - it should be able to figure that out on > it's own I'd have thought. Is that just for consistency, or did I miss > something? Well, we were going to have to figure the latestXid in xact.c anyway in the main-xact path, so I thought it was better to have it responsible in both cases. It's a judgment call though. > 2) I don't see how either the childXids array or the XID cache in the > proc array could ever not be in ascending xid order. We do guarantee > that a child xid is always greater than it's parent. It's the relationships among siblings that worry me. It's likely that we could guarantee this with a bit of care and some Asserts in xact.c's xid-list manipulations, but it's not quite obvious that it must be so --- for instance using an lcons instead of lappend might break it. > This guarantee > enables a few optimizations. First, as you say in the comments, > finding the largest xid when committing becomes trivial. But more > important, if we can assume that the proc array xid cache is always > sorted, we can get ride of the exclusive lock during subxact abort. > We will *always* remove the last n xids from the cache, so we just > need to reset subxids.nxids, and not touch the actual array at all. > (We might later set the now-unused entries to InvalidTransactionId, > but thats only cosmetic). I'm not convinced that that works when the subxids cache has overflowed earlier in the transaction. In any case, the bottom line is that it's very unlikely that that code costs enough to be worth optimizing. When you think about the total number of cycles that have been expended on each subtransaction (especially one that has an xid), another comparison or two is lost in the noise. As long as the code isn't O(N^2) ... which it isn't AFAICS ... I would rather keep it simple and robust. regards, tom lane
Re: WIP patch for latestCompletedXid method of computing snapshot xmax
From
"Florian G. Pflug"
Date:
Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> 2) I don't see how either the childXids array or the XID cache in the proc >> array could ever not be in ascending xid order. We do guarantee that a >> child xid is always greater than it's parent. > > It's the relationships among siblings that worry me. It's likely that we > could guarantee this with a bit of care and some Asserts in xact.c's xid-list > manipulations, but it's not quite obvious that it must be so --- for instance > using an lcons instead of lappend might break it. Hm.. AFAICT, a transaction can only ever have currently "live" subtransaction. In other words, for any node in the transaction tree, all subtrees except for one are always aborted. Since we do guarantee that a node has an xid smaller than the of any node below, it seems the only place we have to be careful at is AtSubCommit_childXids. > >> This guarantee enables a few optimizations. First, as you say in the >> comments, finding the largest xid when committing becomes trivial. But more >> important, if we can assume that the proc array xid cache is always >> sorted, we can get ride of the exclusive lock during subxact abort. > >> We will *always* remove the last n xids from the cache, so we just need to >> reset subxids.nxids, and not touch the actual array at all. (We might later >> set the now-unused entries to InvalidTransactionId, but thats only >> cosmetic). > > I'm not convinced that that works when the subxids cache has overflowed > earlier in the transaction. It should. AFAICS, the subxid cache always shows the xids along the "path" of the transaction tree that leads to the currently active subtransaction. If it hasn't overflowed, that is. If it has, the list is cut off at some point. But being cut-off will only make the number of elements we remove smaller, but not change that we remove some numer of last elements. > In any case, the bottom line is that it's very unlikely that that code costs > enough to be worth optimizing. When you think about the total number of > cycles that have been expended on each subtransaction (especially one that > has an xid), another comparison or two is lost in the noise. As long as the > code isn't O(N^2) ... which it isn't AFAICS ... I would rather keep it simple > and robust. The real win is that because of the guaranteed order, removing xids from the cache gets less messy - it reduces to just changing subxids.nxids.Which in turn allows us to only hold a *shared* proc array lock while doing the removal. To quote my original mail "Removing subxids from the proc array cannnot effect xmin calculations, because the toplevel xid is always small than all it's children. So we only need to care about snapshots themselves. But they won't care much if they find an aborted sub xid in the proc array or now. If not, the xid is neither running nor committed, so it's assumes to have crashed. If it's in the snapshot, it's treated as in-progress." greetings, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes: > This guarantee enables a few optimizations. First, as you say in the > comments, finding the largest xid when committing becomes trivial. But more > important, if we can assume that the proc array xid cache is always > sorted, we can get ride of the exclusive lock during subxact abort. No, we can't, because subxact abort still has to advance latestCompletedXid. regards, tom lane
Re: WIP patch for latestCompletedXid method of computing snapshot xmax
From
"Florian G. Pflug"
Date:
Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> This guarantee enables a few optimizations. First, as you say in the >> comments, finding the largest xid when committing becomes trivial. But more >> important, if we can assume that the proc array xid cache is always >> sorted, we can get ride of the exclusive lock during subxact abort. > > No, we can't, because subxact abort still has to advance latestCompletedXid. I don't think so. I didn't notice that when initially reading your patch - but it seems that updating latestCompletedXid during subxact abort is actually worsening performance (only very slightly, though). The whole point of updating latestCompletedXid when aborting a toplevel xact is to prevent the following corner case. .) Some transaction commits .) Only readonly transactions (all xmins = xmaxs = latestCompletedXid+1) .) One large Transaction that Aborts (xid = GetNewTransactionId() = latestCompletedXid+1) .) Only readonly transactions for a long period again. all xmins = xmaxs = latestCompletedXid+1 If the ABORT didn't update latestCompletedXid, than we'd not be able to remove rows created by that one large transactions (which aborted), because all the xmins would still be latestCompletedXid+1, which is not > the xid of the aborted transaction. So we updating latestCompletedXid is only *necessary* durings COMMITS - during ABORTS it's just done for efficiency reasons. But for subtransactions, there is no efficiency gain at all, because the toplevel xid already bounds the xmin. In fact, updating latestCompletedXid during subxact moves the snapshot's xmax unnecessary far into the future, which leads to larger snasphots, meaning more cycles spent to scan them. I do feel that I explained the idea rather badly though in my initial response to your patch - I'll post (hopefully) better explanation to the hackers list shortly. greetings, Florian Pflug
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> "Tom Lane" <tgl@sss.pgh.pa.us> writes: >>> This patch implements Florian's idea about how to manage snapshot xmax >>> without the ugly and performance-losing tactic of taking XidGenLock and >>> ProcArrayLock at the same time. I had to do a couple of slightly klugy >>> things to get bootstrap and prepared transactions to work, but on the >>> whole it seems at least as clean as the code we have now. Comments? > >> Just that it will be fascinating to see what effects this has on the >> benchmarks. > > Yeah, I was hoping to get some benchmarks before deciding whether it's > worth the risk of pushing this into 8.3. I'm off trying pgbench now, > but if anyone wants to try something more serious like DBT2 ... I ran some DBT2 tests but haven't been able to show any performance effects either in average or worst-case response times. I think that's for a few reasons: 1) This is only a dual-processor machine I'm playing with and we scale well on two processors already. 2) TPC-C doesn't have many read-only transactions 3) The deadlocks I found earlier cause enough noise in the response times to hide any subtler effects. We may have to wait until the next time Sun runs their 1,000-process monster Java benchmark to see if it helps there. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com