Thread: Nested transactions RFC

Nested transactions RFC

From
Manfred Koizar
Date:
Hi,

if it is acceptable for subtransactions to use up transaction numbers,
then here is a half baked RFC for a possible implementation.
If not, forget the rest of this message.

The proposed implementation works much like the current transaction
handling.  It needs an additional system table
pg_subtrans (child XactId PRIMARY KEY, parent XactId).

BEGIN;  -- starts a new (top level) transaction, say 100

INSERT row1;  -- row1.xmin = 100
DELETE row2;  -- row2.xmax = 100

BEGIN;  -- starts a subtransaction, let's call it 200,       -- stores 100 on the parent transaction stack       -- (a
localmemory structure),       -- inserts (200, 100) into pg_subtrans 

INSERT row3;  -- row3.xmin = 200, row3.XMIN_IS_SUB = true

DELETE row4;  -- row4.xmax = 200, row4.XMAX_IS_SUB = true

COMMIT;  -- resets CurrentTransaction to 100 (pop from xact stack),        -- does *NOT* mark T200 as committed

BEGIN;  -- starts a subtransaction, let's call it 300,       -- pushes 100 on the parent transaction stack,       --
inserts(300, 100) into pg_subtrans 

BEGIN;  -- starts a 3rd level subtransaction (400),       -- pushes 300 on the parent transaction stack,       --
inserts(400, 300) into pg_subtrans 

...

COMMIT;  -- resets CurrentTransaction to 300 (transaction stack),        -- does NOT mark T400 as committed

INSERT row5;  -- row5.xmin = 300, row5.XMIN_IS_SUB = true

DELETE row6;  -- row6.xmax = 300, row6.XMAX_IS_SUB = true

ROLLBACK;  -- resets CurrentTransaction to 100 (transaction stack),          -- optionally removes (300, 100) from
pg_subtrans,         -- marks T300 as aborted 

COMMIT;  -- marks T100 as committed
or
ROLLBACK;  -- marks T100 as aborted


Visibility:
-----------

The checks for xmin and xmax are very similar.  We look at xmin here:

Traditionally a tuple is visible, if xmin has committed before the
current snapshot was taken, or if xmin == CurrentTransaction().

A subtransaction is considered aborted, if it is marked aborted.  Else
it is considered to be in the same state as its parent transaction
(which again can be a subtransaction).

The effects of tup.xmin are considered visible, if ...
(This is not a formal specification.  It shall only illustrate the
difference to the existing implementation of HeapTupleSatisfiesXxx()
in tqual.c)
   if (tup.XMIN_ABORTED)  // flag set by prior visitor       return false;
   if (tup.XMIN_COMMITTED)  // flag set by prior visitor       return true;
   // xmin neither known aborted nor known committed,   // could be active   // or finished and tup not yet visited
for(xmin = tup.xmin; IsValid(xmin); xmin = GetParentXact(xmin)) {       if (TransactionIdDidAbort(xmin)) {
tup.XMIN_ABORTED= true;           return false;       }/*if*/ 
       if (IsCurrentTransaction(xmin)) {           // tup.xmin is one of my own subtransactions,           // it is
alreadycommitted.  So tup can be           // considered belonging to the current transaction.           tup.xmin =
xmin;          tup.XMIN_IS_SUB = CurrentTransactionIsSub();           return true;  // or rather check cmin ...
}/*if*/             if (TransactionIdDidCommit(xmin)) {           // xmin is a top level transaction           tup.xmin
=xmin;           tup.XMIN_IS_SUB = false;           tup.XMIN_COMMITTED = true;           return true;       }/*if*/ 
       if (!tup.XMIN_IS_SUB) {           // Don't try expensive GetParentXact()           break;       }/*if*/
}/*for*/
   // tup.xmin still active   return false;

TransactionId GetParentXact(TransactionId xnum) uses pg_subtrans to
find the parent transaction of xnum.  It returns InvalidTransaction,
if it doesn't find one.


Performance:
------------

.  Zero overhead, if nested transactions are not used.

.  BEGIN SUB has to insert a pair of TransactionIds into pg_subtrans.
Apart from that it is not slower than BEGIN top level transaction.

.  COMMIT SUB is faster than COMMIT.

.  ROLLBACK SUB is much like ROLLBACK, plus (optionally) deletes one
entry from pg_subtrans.

.  COMMIT and ROLLBACK of top level transactions don't care about
subtransactions.

.  Access a tuple inserted/deleted by a subtransaction:  Zero
overhead, if the subtransaction has been rolled back, otherwise the
parent transaction has to be looked up in pg_subtrans (possibly
recursive).  This price has to be paid only once per tuple (well, once
for xmin and once for xmax).  More accurate: "once after the
inserting/deleting top level transaction has finished".


Problems:
---------

.  pg_subtrans grows by 8 bytes per subtransaction.

.  Other pitfalls???


Administration:
---------------

As soon as a top level transaction has finished, its subtransaction
ids are replaced by the top level transaction id on the next access to
each tuple.

VACUUM (*not* VACUUM tablename) removes old entries from pg_subtrans.
An entry is old, if the parent transaction has finished, before VACUUM
started.


Challenge:
----------

For heavy use of subtransactions there has to be a really fast
implementation of pg_subtrans, maybe something like a b-tree.

AFAICS small WAL changes: pg_subtrans inserts (and deletes?) have to
be logged.

Everything else is straightforward.

Comments?

ServusManfred


Re: Nested transactions RFC

From
Tom Lane
Date:
Manfred Koizar <mkoi-pg@aon.at> writes:
> TransactionId GetParentXact(TransactionId xnum) uses pg_subtrans to
> find the parent transaction of xnum.

This is not only extremely expensive, but in practice would cause
infinite recursion: any attempt to validate the commit state of a
row in pg_subtrans would result in a recursive attempt to search
pg_subtrans.  I don't think we can do table access from inside the
tqual.c routines.

A practical implementation, which would cost little except tuple header
space (and yes I know that won't make you happy) would require 3 fields
instead of 2 for both the min and the max:transaction IDsubtransaction IDcommand ID
First check the transaction ID: if aborted or (in-progress and not
mine), tuple is not visible.  Next, if the subtransaction ID is not
zero, similarly check it.  Finally, if xid and sub-xid are both mine,
the command ID has to be checked.

In this scenario, subtransactions commit or abort by marking their
pg_clog entries, but no one else will care until the parent transaction
commits.  So there is no extra state anywhere except for the stack
of active transaction numbers inside each backend.

Possibly we could use techniques similar to what you already suggested
for cmin/cmax to reduce the amount of physical storage needed for the
six logical fields involved.
        regards, tom lane

PS: unfortunately, tuple validity checking is only a small part of what
has to be done to support subtransactions.  The really nasty part is
in fixing error recovery inside the backend so that (most) errors can
be dealt with by aborting only the innermost subtransaction.


Re: Nested transactions RFC

From
Manfred Koizar
Date:
Tom,

reading my message again and your response, I see, that some points
were a bit unclear.

On Fri, 10 May 2002 13:12:21 +0200, I wrote:
|if it is acceptable for subtransactions to use up transaction numbers,
Of course, "use up" is nonsense, as it sounds like "use all
available";  this should have been "use" or "draw from the pool of".
Should have listened better to my English teacher :-)

On Sat, 11 May 2002 11:51:37 -0400, Tom Lane <tgl@sss.pgh.pa.us>
wrote:
>Manfred Koizar <mkoi-pg@aon.at> writes:
>> TransactionId GetParentXact(TransactionId xnum) uses pg_subtrans to
>> find the parent transaction of xnum.
>
>This is not only extremely expensive, but in practice would cause
>infinite recursion: any attempt to validate the commit state of a
>row in pg_subtrans would result in a recursive attempt to search
>pg_subtrans.  I don't think we can do table access from inside the
>tqual.c routines.

I wrote:
|It needs an additional system table
|pg_subtrans (child XactId PRIMARY KEY, parent XactId).

But no!  "Table" is not the correct word for what I mean.  I rather
want something living outside transactions and not accessed via normal
SQL statements.  It is to be handled by highly specialized and
optimized routines, because fast access is crucial for the whole
proposal.  That's why I called "a really fast implementation of
pg_subtrans" a challenge.  I had pg_clog in mind, but didn't find the
right words.

>A practical implementation, which would cost little except tuple header
>space (and yes I know that won't make you happy) would require 3 fields       :-)

>instead of 2 for both the min and the max:
>    transaction ID
>    subtransaction ID
>    command ID
This was my first attempt.  I've dismissed it for several reasons.

>First check the transaction ID: if aborted or (in-progress and not
>mine), tuple is not visible.
I agree up to here.

>Next, if the subtransaction ID is not
>zero, similarly check it.
Now imagine
BEGIN 1;
BEGIN 2;
BEGIN 3;
INSERT tup3;
COMMIT 3;
ROLLBACK 2;
COMMIT 1;

Then in tup3 we would have xid==1 and subxid==3, both of which are
committed, but nevertheless tup3 is invisible, because xact 2 aborted.

>Finally, if xid and sub-xid are both mine,
>the command ID has to be checked.
>
>In this scenario, subtransactions commit or abort by marking their
>pg_clog entries, but no one else will care until the parent transaction
>commits.  So there is no extra state anywhere except for the stack
>of active transaction numbers inside each backend.
A *stack* of _active_ transaction numbers is not sufficient, we need
the whole *tree* of _all_ transactions belonging to the current top
level transaction.  This is, want I wanted to model in my pg_subtrans
"table".  And pg_subtrans cannot be a private structure, because it
has to be inspected by other transactions too (cf. example above).

>PS: unfortunately, tuple validity checking is only a small part of what
>has to be done to support subtransactions.  The really nasty part is
>in fixing error recovery inside the backend so that (most) errors can
>be dealt with by aborting only the innermost subtransaction.
Is this really related to subtransactions?  The current behaviour is,
that an error not only aborts the offending command, but the whole
(top level) transaction.  My proposal doesn't change anything
regarding this.  Though I agree it would be desirable to have finer
grained error handling.

You have quoted only small parts of my posting.  Do you agree to the
rest?  Or didn't you bother to comment, because you considered the
whole proposal refuted by your counter-arguments?  I'll be fine either
way, I just want to know.

BTW, there's something missing from my visibility checks:
|        if (IsCurrentTransaction(xmin)) {
here we have to add "or xmin is one of my (grand)*parents".

And of course, it would be nice to have named savepoints:
BEGIN;
BEGIN foo;
BEGIN bar;
...
ROLLBACK foo;
COMMIT;  -- top level transaction

ServusManfred


Re: Nested transactions RFC

From
Tom Lane
Date:
Manfred Koizar <mkoi-pg@aon.at> writes:
> A *stack* of _active_ transaction numbers is not sufficient, we need
> the whole *tree* of _all_ transactions belonging to the current top
> level transaction.  This is, want I wanted to model in my pg_subtrans
> "table".  And pg_subtrans cannot be a private structure, because it
> has to be inspected by other transactions too (cf. example above).

Hmm.  This seems to me to be vastly overdesigning the feature.  I've
never yet seen a practical application for more than one level of
subtransaction, so I question whether we should buy into a substantially
more complex implementation to support the more general case.

> Is this really related to subtransactions?  The current behaviour is,
> that an error not only aborts the offending command, but the whole
> (top level) transaction.  My proposal doesn't change anything
> regarding this.

Every single application that I've seen for subtransactions is all about
error recovery.  If we don't fix that then there's no point.

> You have quoted only small parts of my posting.

I don't believe in quoting whole postings, only enough to remind people
what it was I'm responding to.
        regards, tom lane


Re: Nested transactions RFC

From
Stephan Szabo
Date:
On Sun, 12 May 2002, Tom Lane wrote:

> Manfred Koizar <mkoi-pg@aon.at> writes:
> > A *stack* of _active_ transaction numbers is not sufficient, we need
> > the whole *tree* of _all_ transactions belonging to the current top
> > level transaction.  This is, want I wanted to model in my pg_subtrans
> > "table".  And pg_subtrans cannot be a private structure, because it
> > has to be inspected by other transactions too (cf. example above).
>
> Hmm.  This seems to me to be vastly overdesigning the feature.  I've
> never yet seen a practical application for more than one level of
> subtransaction, so I question whether we should buy into a substantially
> more complex implementation to support the more general case.

I think it'd depend on how pervasive the feature is going to be.  If we
allow functions/rules/etc to start subtransactions I'm not sure it'd
be safe to say that only one level is safe since you might not know that
your subtransaction calls something that wants to start a subtransaction,
but you'd probably expect that anything it does would be undone when you
rollback your subtransaction, just like it would if the items weren't
in a subtransaction.



Re: Nested transactions RFC

From
Hiroshi Inoue
Date:
Tom Lane wrote:
> 
> Manfred Koizar <mkoi-pg@aon.at> writes:
> > A *stack* of _active_ transaction numbers is not sufficient, we need
> > the whole *tree* of _all_ transactions belonging to the current top
> > level transaction.  This is, want I wanted to model in my pg_subtrans
> > "table".  And pg_subtrans cannot be a private structure, because it
> > has to be inspected by other transactions too (cf. example above).
> 
> Hmm.  This seems to me to be vastly overdesigning the feature.  I've
> never yet seen a practical application for more than one level of
> subtransaction, so I question whether we should buy into a substantially
> more complex implementation to support the more general case.

I'm for Manfred with this point. I would never suppose
that nested transactions supports only 1 level.

> 
> > Is this really related to subtransactions?  The current behaviour is,
> > that an error not only aborts the offending command, but the whole
> > (top level) transaction.  My proposal doesn't change anything
> > regarding this.
> 
> Every single application that I've seen for subtransactions is all about
> error recovery. If we don't fix that then there's no point.

The problem exists with any implementation.
Though tuple validity checking may be only a small part
(I don't think so), I've never seen such proposal other
than Manfred's one.

If I remember correctly, savepoints functionality
was planned for 7.0 but probably we wouldn't have
it in 7.3. The TODO may be a TODO for ever unless
the direction to solve the TODO is decided.

1) without UNDO for individual tuples.
2) with UNDO for individual tuples under no  overwriting smgr.
3) UNDO under overwriting smgr.

regards,
Hiroshi Inouehttp://w2422.nsk.ne.jp/~inoue/