Thread: pg_subtrans and WAL

pg_subtrans and WAL

From
Tom Lane
Date:
My discovery last night of a WAL synchronization error in pg_clog led me
to take a look at pg_subtrans too.  I soon realized that in fact we are
not WAL-logging pg_subtrans updates at all: subtransaction start sets up
a pg_subtrans entry but makes no WAL entry for this action.

Seems like this is a problem.

It may be that we do not care because pg_subtrans doesn't have to be
valid after a crash, but I haven't seen any proof of that theory.
And if that theory is correct, then it is a seriously bad design to be
using the same code infrastructure for both pg_clog and pg_subtrans.
Every fsync on pg_subtrans is wasted effort if that is going to be our
approach.  We should in fact just delete pg_subtrans and re-init it to
zeroes during postmaster start...
        regards, tom lane


Re: pg_subtrans and WAL

From
Alvaro Herrera
Date:
On Tue, Aug 10, 2004 at 12:24:06PM -0400, Tom Lane wrote:

> It may be that we do not care because pg_subtrans doesn't have to be
> valid after a crash, but I haven't seen any proof of that theory.

Let's suppose we crash between creating a child transaction and marking
it as done.  What we have to ensure after a crash is that if we marked
the parent as committed, the child has to be marked committed too.  The
WAL record carries this information already, and on recovery, the child
will be marked COMMIT.

The whole point of the subtrans info is to be available _while_ the
transaction tree is running.  If there is a crash, then by definition no
backend can be running when we return, so pg_subtrans info is useless at
that point.  We only need pg_clog to be correct.


> And if that theory is correct, then it is a seriously bad design to be
> using the same code infrastructure for both pg_clog and pg_subtrans.
> Every fsync on pg_subtrans is wasted effort if that is going to be our
> approach.

Right, but AFAICS both pg_clog and pg_subtrans are only fsync'ed during
checkpoint and shutdown, so it doesn't seem that costly.  We could
certainly skip calling CheckPointSUBTRANS() or making it a noop ...

> We should in fact just delete pg_subtrans and re-init it to zeroes
> during postmaster start...

Is it worth the duplicated code?  It won't be consulted anyway for
pre-crash Xids, because TransactionIdIsInProgress will return early by
means of RecentGlobalXmin.


On a related note: if we mark a Xid with SUBTRANS COMMIT and later crash
without updating it, the main Xid will remain in in-progress status.  At
what point is it marked aborted?  I can see such a status change only in
XactLockTableWait.  This may be important because we will change the
subtransaction to aborted state only if we see the parent in aborted
state too.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)



Re: pg_subtrans and WAL

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> On Tue, Aug 10, 2004 at 12:24:06PM -0400, Tom Lane wrote:
>> It may be that we do not care because pg_subtrans doesn't have to be
>> valid after a crash, but I haven't seen any proof of that theory.

> The whole point of the subtrans info is to be available _while_ the
> transaction tree is running.  If there is a crash, then by definition no
> backend can be running when we return, so pg_subtrans info is useless at
> that point.  We only need pg_clog to be correct.

But we also have to be sure that we don't try to access the useless info
anyway.  For instance some pre-crash subxacts might remain marked
SUBCOMMITTED in clog indefinitely.  I think this could be worked around:
for example, TransactionIdDidCommit could assume that any SUBCOMMITTED
xact older than RecentGlobalXmin must represent a child of a crashed
parent.  It shouldn't be too hard to guarantee that we never touch
pg_subtrans for XIDs older than RecentGlobalXmin.  We don't have that
guarantee in place at the moment though.

>> And if that theory is correct, then it is a seriously bad design to be
>> using the same code infrastructure for both pg_clog and pg_subtrans.
>> Every fsync on pg_subtrans is wasted effort if that is going to be our
>> approach.

> Right, but AFAICS both pg_clog and pg_subtrans are only fsync'ed during
> checkpoint and shutdown, so it doesn't seem that costly.  We could
> certainly skip calling CheckPointSUBTRANS() or making it a noop ...

The point is that the behaviors are fundamentally different.  We have no
need for any WAL log entries for pg_subtrans; we should never fsync it;
and the rules for deciding when and where to truncate it are a lot
different (or at least should be different).  I thought from the
beginning that the slru layer underneath pg_clog was bad from the point
of view of obfuscating the code, because it forced an awkward division
of labor between clog.c and slru.c.  Now that I realize that there's not
that much behavior that we really want to share, I wonder whether we
shouldn't revert that change and make subtrans.c stand on its own.

> On a related note: if we mark a Xid with SUBTRANS COMMIT and later crash
> without updating it, the main Xid will remain in in-progress status.  At
> what point is it marked aborted?

I do not think there's any guarantee that it ever will be so marked.
Certainly it could be a very long time until someone exhibits any
interest in that particular Xid's status...
        regards, tom lane


Re: pg_subtrans and WAL

From
Alvaro Herrera
Date:
On Fri, Aug 20, 2004 at 01:36:39PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > On Tue, Aug 10, 2004 at 12:24:06PM -0400, Tom Lane wrote:
> >> It may be that we do not care because pg_subtrans doesn't have to be
> >> valid after a crash, but I haven't seen any proof of that theory.
> 
> > The whole point of the subtrans info is to be available _while_ the
> > transaction tree is running.  If there is a crash, then by definition no
> > backend can be running when we return, so pg_subtrans info is useless at
> > that point.  We only need pg_clog to be correct.
> 
> But we also have to be sure that we don't try to access the useless info
> anyway.

Hmm.  I've skimmed over callers of TransactionIdDid{Commit,Abort} and
they are in the following places:

- tqual.c (all after TransactionIdIsInProgress)
- heapam.c (all after XactLockTableWait)
- one in xact.c (while aborting, assert that it isn't committed)
- one in xlog.c (in #ifdef UNDO)

- lmgr.c (in XactLockTableWait)
- two in xlogutils.c

Those last two are the ones that could cause trouble; I think the others
are safe.  Actually I'm not sure what the ones in xlogutils.c are about.

> For instance some pre-crash subxacts might remain marked SUBCOMMITTED
> in clog indefinitely.  I think this could be worked around: for
> example, TransactionIdDidCommit could assume that any SUBCOMMITTED
> xact older than RecentGlobalXmin must represent a child of a crashed
> parent.

This would be a slight modification.  sinval.c needs to export
RecentGlobalXmin, and TransactionIdDidCommit check it.  It's a small
change AFAICS.  Do you want me to submit a patch?

> The point is that the behaviors are fundamentally different.  We have no
> need for any WAL log entries for pg_subtrans; we should never fsync it;
> and the rules for deciding when and where to truncate it are a lot
> different (or at least should be different).  I thought from the
> beginning that the slru layer underneath pg_clog was bad from the point
> of view of obfuscating the code, because it forced an awkward division
> of labor between clog.c and slru.c.  Now that I realize that there's not
> that much behavior that we really want to share, I wonder whether we
> shouldn't revert that change and make subtrans.c stand on its own.

Maybe you are right.  I think this is a bigger change and it's not badly
needed, so it can wait to 8.1.  Do you agree?

> > On a related note: if we mark a Xid with SUBTRANS COMMIT and later crash
> > without updating it, the main Xid will remain in in-progress status.  At
> > what point is it marked aborted?
> 
> I do not think there's any guarantee that it ever will be so marked.
> Certainly it could be a very long time until someone exhibits any
> interest in that particular Xid's status...

Right, that's what I thought.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Granting software the freedom to evolve guarantees only different results,
not better ones." (Zygo Blaxell)



Re: pg_subtrans and WAL

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> On Fri, Aug 20, 2004 at 01:36:39PM -0400, Tom Lane wrote:
>> I thought from the
>> beginning that the slru layer underneath pg_clog was bad from the point
>> of view of obfuscating the code, because it forced an awkward division
>> of labor between clog.c and slru.c.  Now that I realize that there's not
>> that much behavior that we really want to share, I wonder whether we
>> shouldn't revert that change and make subtrans.c stand on its own.

> Maybe you are right.  I think this is a bigger change and it's not badly
> needed, so it can wait to 8.1.  Do you agree?

Not really.  I think truncating pg_subtrans in a more aggressive fashion
is a "must do" for 8.0 because otherwise it will be a disk hog (it is 16
times bigger than pg_clog remember).  And I'm unconvinced that we don't
have crash-safety issues right now, seeing that subtrans updates are not
WAL-logged but we are using pg_clog logic that is built around the
assumption that commits *are* WAL-logged.  What I want to do at minimum
is install patches that ensure we won't look back further than
RecentGlobalXmin; truncate subtrans on that basis; and change the
subtrans code to forcibly zero the current active page of pg_subtrans
during postmaster start, rather than trusting what's on disk.

By the time we do that it might be easiest to just go ahead and make the
split.  I am not fearful of reverting the clog code to a prior state as
far as testing goes --- we already know the pre-slru code worked.

I'll try to look at this over the weekend.
        regards, tom lane