Thread: Note about robustness of transaction-related data structures

Note about robustness of transaction-related data structures

From
Tom Lane
Date:
While running a parallel regression test I saw a failure"relation deleted while still in use"
out of the subtransactions test, after which the backend dumped core.
The coredump was in AtSubAbort_smgr, and it was failing because
upperPendingDeletes was NIL (which it should not have been inside a
subtransaction, of course).

The underlying problem doesn't seem very reproducible --- I tried
several times without seeing it again.  But what I deduce from the core
dump is that we had an error during subtransaction cleanup, leading to
an attempt to re-abort an already partially aborted transaction, and
the system just could not cope.  The problem is that the list-of-lists
data structure used in smgr.c is brittle: it won't survive two
executions of AtSubAbort_smgr at the same nesting level.

In my local copy I've removed the list-of-lists data structure and
reverted to a flat list of pending deletion requests, with the addition
of a transaction nesting level field in each entry.  AtSubCommit_smgr
does this:
   int        nestLevel = GetCurrentTransactionNestLevel();   PendingRelDelete *pending;
   for (pending = pendingDeletes; pending != NULL; pending = pending->next)   {       if (pending->nestLevel >=
nestLevel)          pending->nestLevel = nestLevel - 1;   }
 

while AtSubAbort_smgr acts only on entries with nestlevel >= transaction
nest level.  This makes the data structure safe against multiple
execution.

I think we will have to make similar adjustments in the other places
where we've used list-of-lists data structures.  We may need to look a
little closer at the manipulations of the transaction state stack in
xact.c, as well.
        regards, tom lane


Re: Note about robustness of transaction-related data structures

From
Alvaro Herrera
Date:
On Thu, Jul 15, 2004 at 05:33:56PM -0400, Tom Lane wrote:

> The underlying problem doesn't seem very reproducible --- I tried
> several times without seeing it again.  But what I deduce from the core
> dump is that we had an error during subtransaction cleanup, leading to
> an attempt to re-abort an already partially aborted transaction, and
> the system just could not cope.  The problem is that the list-of-lists
> data structure used in smgr.c is brittle: it won't survive two
> executions of AtSubAbort_smgr at the same nesting level.

I saw the same problem while developing savepoints.

> I think we will have to make similar adjustments in the other places
> where we've used list-of-lists data structures.

The solution seems reasonable on that list, which is expected to be
short most of the time, but will it be acceptable on longer lists like
the one on inval.c?  I'm not sure when to start worrying about
sequential walking.


> We may need to look a little closer at the manipulations of the
> transaction state stack in xact.c, as well.

Not sure how to protect this.  How about a static int with the current
nesting level, and checking that against
CurrentTransactionState->nestingLevel on PopTransaction() and
PushTransaction()?  Differences should be treated as FATAL errors, I
presume.  Or throw a warning and return doing nothing else.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Tiene valor aquel que admite que es un cobarde" (Fernandel)