Thread: pg_subtrans and WAL
My discovery last night of a WAL synchronization error in pg_clog led me to take a look at pg_subtrans too. I soon realized that in fact we are not WAL-logging pg_subtrans updates at all: subtransaction start sets up a pg_subtrans entry but makes no WAL entry for this action. Seems like this is a problem. It may be that we do not care because pg_subtrans doesn't have to be valid after a crash, but I haven't seen any proof of that theory. And if that theory is correct, then it is a seriously bad design to be using the same code infrastructure for both pg_clog and pg_subtrans. Every fsync on pg_subtrans is wasted effort if that is going to be our approach. We should in fact just delete pg_subtrans and re-init it to zeroes during postmaster start... regards, tom lane
On Tue, Aug 10, 2004 at 12:24:06PM -0400, Tom Lane wrote: > It may be that we do not care because pg_subtrans doesn't have to be > valid after a crash, but I haven't seen any proof of that theory. Let's suppose we crash between creating a child transaction and marking it as done. What we have to ensure after a crash is that if we marked the parent as committed, the child has to be marked committed too. The WAL record carries this information already, and on recovery, the child will be marked COMMIT. The whole point of the subtrans info is to be available _while_ the transaction tree is running. If there is a crash, then by definition no backend can be running when we return, so pg_subtrans info is useless at that point. We only need pg_clog to be correct. > And if that theory is correct, then it is a seriously bad design to be > using the same code infrastructure for both pg_clog and pg_subtrans. > Every fsync on pg_subtrans is wasted effort if that is going to be our > approach. Right, but AFAICS both pg_clog and pg_subtrans are only fsync'ed during checkpoint and shutdown, so it doesn't seem that costly. We could certainly skip calling CheckPointSUBTRANS() or making it a noop ... > We should in fact just delete pg_subtrans and re-init it to zeroes > during postmaster start... Is it worth the duplicated code? It won't be consulted anyway for pre-crash Xids, because TransactionIdIsInProgress will return early by means of RecentGlobalXmin. On a related note: if we mark a Xid with SUBTRANS COMMIT and later crash without updating it, the main Xid will remain in in-progress status. At what point is it marked aborted? I can see such a status change only in XactLockTableWait. This may be important because we will change the subtransaction to aborted state only if we see the parent in aborted state too. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Tue, Aug 10, 2004 at 12:24:06PM -0400, Tom Lane wrote: >> It may be that we do not care because pg_subtrans doesn't have to be >> valid after a crash, but I haven't seen any proof of that theory. > The whole point of the subtrans info is to be available _while_ the > transaction tree is running. If there is a crash, then by definition no > backend can be running when we return, so pg_subtrans info is useless at > that point. We only need pg_clog to be correct. But we also have to be sure that we don't try to access the useless info anyway. For instance some pre-crash subxacts might remain marked SUBCOMMITTED in clog indefinitely. I think this could be worked around: for example, TransactionIdDidCommit could assume that any SUBCOMMITTED xact older than RecentGlobalXmin must represent a child of a crashed parent. It shouldn't be too hard to guarantee that we never touch pg_subtrans for XIDs older than RecentGlobalXmin. We don't have that guarantee in place at the moment though. >> And if that theory is correct, then it is a seriously bad design to be >> using the same code infrastructure for both pg_clog and pg_subtrans. >> Every fsync on pg_subtrans is wasted effort if that is going to be our >> approach. > Right, but AFAICS both pg_clog and pg_subtrans are only fsync'ed during > checkpoint and shutdown, so it doesn't seem that costly. We could > certainly skip calling CheckPointSUBTRANS() or making it a noop ... The point is that the behaviors are fundamentally different. We have no need for any WAL log entries for pg_subtrans; we should never fsync it; and the rules for deciding when and where to truncate it are a lot different (or at least should be different). I thought from the beginning that the slru layer underneath pg_clog was bad from the point of view of obfuscating the code, because it forced an awkward division of labor between clog.c and slru.c. Now that I realize that there's not that much behavior that we really want to share, I wonder whether we shouldn't revert that change and make subtrans.c stand on its own. > On a related note: if we mark a Xid with SUBTRANS COMMIT and later crash > without updating it, the main Xid will remain in in-progress status. At > what point is it marked aborted? I do not think there's any guarantee that it ever will be so marked. Certainly it could be a very long time until someone exhibits any interest in that particular Xid's status... regards, tom lane
On Fri, Aug 20, 2004 at 01:36:39PM -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > On Tue, Aug 10, 2004 at 12:24:06PM -0400, Tom Lane wrote: > >> It may be that we do not care because pg_subtrans doesn't have to be > >> valid after a crash, but I haven't seen any proof of that theory. > > > The whole point of the subtrans info is to be available _while_ the > > transaction tree is running. If there is a crash, then by definition no > > backend can be running when we return, so pg_subtrans info is useless at > > that point. We only need pg_clog to be correct. > > But we also have to be sure that we don't try to access the useless info > anyway. Hmm. I've skimmed over callers of TransactionIdDid{Commit,Abort} and they are in the following places: - tqual.c (all after TransactionIdIsInProgress) - heapam.c (all after XactLockTableWait) - one in xact.c (while aborting, assert that it isn't committed) - one in xlog.c (in #ifdef UNDO) - lmgr.c (in XactLockTableWait) - two in xlogutils.c Those last two are the ones that could cause trouble; I think the others are safe. Actually I'm not sure what the ones in xlogutils.c are about. > For instance some pre-crash subxacts might remain marked SUBCOMMITTED > in clog indefinitely. I think this could be worked around: for > example, TransactionIdDidCommit could assume that any SUBCOMMITTED > xact older than RecentGlobalXmin must represent a child of a crashed > parent. This would be a slight modification. sinval.c needs to export RecentGlobalXmin, and TransactionIdDidCommit check it. It's a small change AFAICS. Do you want me to submit a patch? > The point is that the behaviors are fundamentally different. We have no > need for any WAL log entries for pg_subtrans; we should never fsync it; > and the rules for deciding when and where to truncate it are a lot > different (or at least should be different). I thought from the > beginning that the slru layer underneath pg_clog was bad from the point > of view of obfuscating the code, because it forced an awkward division > of labor between clog.c and slru.c. Now that I realize that there's not > that much behavior that we really want to share, I wonder whether we > shouldn't revert that change and make subtrans.c stand on its own. Maybe you are right. I think this is a bigger change and it's not badly needed, so it can wait to 8.1. Do you agree? > > On a related note: if we mark a Xid with SUBTRANS COMMIT and later crash > > without updating it, the main Xid will remain in in-progress status. At > > what point is it marked aborted? > > I do not think there's any guarantee that it ever will be so marked. > Certainly it could be a very long time until someone exhibits any > interest in that particular Xid's status... Right, that's what I thought. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Granting software the freedom to evolve guarantees only different results, not better ones." (Zygo Blaxell)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Fri, Aug 20, 2004 at 01:36:39PM -0400, Tom Lane wrote: >> I thought from the >> beginning that the slru layer underneath pg_clog was bad from the point >> of view of obfuscating the code, because it forced an awkward division >> of labor between clog.c and slru.c. Now that I realize that there's not >> that much behavior that we really want to share, I wonder whether we >> shouldn't revert that change and make subtrans.c stand on its own. > Maybe you are right. I think this is a bigger change and it's not badly > needed, so it can wait to 8.1. Do you agree? Not really. I think truncating pg_subtrans in a more aggressive fashion is a "must do" for 8.0 because otherwise it will be a disk hog (it is 16 times bigger than pg_clog remember). And I'm unconvinced that we don't have crash-safety issues right now, seeing that subtrans updates are not WAL-logged but we are using pg_clog logic that is built around the assumption that commits *are* WAL-logged. What I want to do at minimum is install patches that ensure we won't look back further than RecentGlobalXmin; truncate subtrans on that basis; and change the subtrans code to forcibly zero the current active page of pg_subtrans during postmaster start, rather than trusting what's on disk. By the time we do that it might be easiest to just go ahead and make the split. I am not fearful of reverting the clog code to a prior state as far as testing goes --- we already know the pre-slru code worked. I'll try to look at this over the weekend. regards, tom lane