Nested transactions RFC - Mailing list pgsql-hackers

From Manfred Koizar
Subject Nested transactions RFC
Date
Msg-id riandu8d681u68h1grteqo7nvh6hmtilis@4ax.com
Whole thread Raw
Responses Re: Nested transactions RFC
List pgsql-hackers
Hi,

if it is acceptable for subtransactions to use up transaction numbers,
then here is a half baked RFC for a possible implementation.
If not, forget the rest of this message.

The proposed implementation works much like the current transaction
handling.  It needs an additional system table
pg_subtrans (child XactId PRIMARY KEY, parent XactId).

BEGIN;  -- starts a new (top level) transaction, say 100

INSERT row1;  -- row1.xmin = 100
DELETE row2;  -- row2.xmax = 100

BEGIN;  -- starts a subtransaction, let's call it 200,       -- stores 100 on the parent transaction stack       -- (a
localmemory structure),       -- inserts (200, 100) into pg_subtrans 

INSERT row3;  -- row3.xmin = 200, row3.XMIN_IS_SUB = true

DELETE row4;  -- row4.xmax = 200, row4.XMAX_IS_SUB = true

COMMIT;  -- resets CurrentTransaction to 100 (pop from xact stack),        -- does *NOT* mark T200 as committed

BEGIN;  -- starts a subtransaction, let's call it 300,       -- pushes 100 on the parent transaction stack,       --
inserts(300, 100) into pg_subtrans 

BEGIN;  -- starts a 3rd level subtransaction (400),       -- pushes 300 on the parent transaction stack,       --
inserts(400, 300) into pg_subtrans 

...

COMMIT;  -- resets CurrentTransaction to 300 (transaction stack),        -- does NOT mark T400 as committed

INSERT row5;  -- row5.xmin = 300, row5.XMIN_IS_SUB = true

DELETE row6;  -- row6.xmax = 300, row6.XMAX_IS_SUB = true

ROLLBACK;  -- resets CurrentTransaction to 100 (transaction stack),          -- optionally removes (300, 100) from
pg_subtrans,         -- marks T300 as aborted 

COMMIT;  -- marks T100 as committed
or
ROLLBACK;  -- marks T100 as aborted


Visibility:
-----------

The checks for xmin and xmax are very similar.  We look at xmin here:

Traditionally a tuple is visible, if xmin has committed before the
current snapshot was taken, or if xmin == CurrentTransaction().

A subtransaction is considered aborted, if it is marked aborted.  Else
it is considered to be in the same state as its parent transaction
(which again can be a subtransaction).

The effects of tup.xmin are considered visible, if ...
(This is not a formal specification.  It shall only illustrate the
difference to the existing implementation of HeapTupleSatisfiesXxx()
in tqual.c)
   if (tup.XMIN_ABORTED)  // flag set by prior visitor       return false;
   if (tup.XMIN_COMMITTED)  // flag set by prior visitor       return true;
   // xmin neither known aborted nor known committed,   // could be active   // or finished and tup not yet visited
for(xmin = tup.xmin; IsValid(xmin); xmin = GetParentXact(xmin)) {       if (TransactionIdDidAbort(xmin)) {
tup.XMIN_ABORTED= true;           return false;       }/*if*/ 
       if (IsCurrentTransaction(xmin)) {           // tup.xmin is one of my own subtransactions,           // it is
alreadycommitted.  So tup can be           // considered belonging to the current transaction.           tup.xmin =
xmin;          tup.XMIN_IS_SUB = CurrentTransactionIsSub();           return true;  // or rather check cmin ...
}/*if*/             if (TransactionIdDidCommit(xmin)) {           // xmin is a top level transaction           tup.xmin
=xmin;           tup.XMIN_IS_SUB = false;           tup.XMIN_COMMITTED = true;           return true;       }/*if*/ 
       if (!tup.XMIN_IS_SUB) {           // Don't try expensive GetParentXact()           break;       }/*if*/
}/*for*/
   // tup.xmin still active   return false;

TransactionId GetParentXact(TransactionId xnum) uses pg_subtrans to
find the parent transaction of xnum.  It returns InvalidTransaction,
if it doesn't find one.


Performance:
------------

.  Zero overhead, if nested transactions are not used.

.  BEGIN SUB has to insert a pair of TransactionIds into pg_subtrans.
Apart from that it is not slower than BEGIN top level transaction.

.  COMMIT SUB is faster than COMMIT.

.  ROLLBACK SUB is much like ROLLBACK, plus (optionally) deletes one
entry from pg_subtrans.

.  COMMIT and ROLLBACK of top level transactions don't care about
subtransactions.

.  Access a tuple inserted/deleted by a subtransaction:  Zero
overhead, if the subtransaction has been rolled back, otherwise the
parent transaction has to be looked up in pg_subtrans (possibly
recursive).  This price has to be paid only once per tuple (well, once
for xmin and once for xmax).  More accurate: "once after the
inserting/deleting top level transaction has finished".


Problems:
---------

.  pg_subtrans grows by 8 bytes per subtransaction.

.  Other pitfalls???


Administration:
---------------

As soon as a top level transaction has finished, its subtransaction
ids are replaced by the top level transaction id on the next access to
each tuple.

VACUUM (*not* VACUUM tablename) removes old entries from pg_subtrans.
An entry is old, if the parent transaction has finished, before VACUUM
started.


Challenge:
----------

For heavy use of subtransactions there has to be a really fast
implementation of pg_subtrans, maybe something like a b-tree.

AFAICS small WAL changes: pg_subtrans inserts (and deletes?) have to
be logged.

Everything else is straightforward.

Comments?

ServusManfred


pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: Queries using rules show no rows modified?
Next
From: mlw
Date:
Subject: Re: Threads vs processes - The Apache Way (Re: Path to PostgreSQL