Thread: Nested transactions: low level stuff

Nested transactions: low level stuff

From

Manfred Koizar

Date:

19 March 2003, 02:29:04

Here is, what I've put together from various messages posted
November/December last year.
. Subtrans trees. Transaction states. Tuple visibility. HeapTupleSatifiesUpdate. Shortcuts. Still missing. Objections
andsuggestions
 


Subtrans trees
--------------

A subtrans tree holds information about sub-transaction to parent
transaction relationships.  The nodes are transaction ids, edges
denote the fact that the child node represents a sub-transaction of
the transaction represented by the parent node.  The root of a
subtrans tree represents a main-transaction.

A subtrans tree is navigable from child to parent.  I.e. if you have
a sub-transaction xid you can find its parent efficiently, but there
is no fast and easy way to enumerate all children of a given parent
xid.

The subtrans dependencies have to be visible to all backends.

At least two possible implementations have been discussed.

Together with clog:  Store a 30 bit parent transaction offset with the
two state bits.  Offset 0 means the transaction is a main-transaction.

In a separate data structure:  pg_subtrans is a huge (possibly
sparsely populated) array of parent xids, indexed by child xid.  This
array is divided into manageable chunks, represented by files,
"pg_subtrans_NNNN".  These files are only created when necessary.  At
any time only a tiny part of the whole array is kept in shared
buffers.  This concept is similar or almost equal to pg_clog, which is
an array of doublebits.  Most of its implementation can be stolen from
the clog code.

We believe that parent xid lookups are far less frequent than
transaction state lookups and therefor favour the second
implementation.


Transaction states
------------------

Let the currently unused fourth state in
pg_clog indicate a committed subtransaction.  In pg_clog there are two
bits per transaction, commit and abort, with the following meaning:
a  c0  0  transaction in progress, the owning backend knows whether it is      a main- or a sub-transaction, other
backendsdon't care1  0  aborted, nobody cares whether main- or sub-transaction0  1  committed main-transaction or -
withshortcut 2 - a sub-      transaction that's known committed to all active transactions1  1  committed
sub-transaction,have to look for parent in      pg_subtrans
 

Updating the state of a single (main or sub) transaction is atomic,
just like it is now.  This solves the atomicity problem without
requiring long locks.

Here is what is to be done for some operations:

BEGIN main transaction: . Get a new xid (no change to current behaviour). . pg_clog[xid] is still 00, meaning active. .
pg_subtrans[xid]is still 0, meaning no parent.
 

BEGIN subtransaction: . Push current transaction info onto local stack. . Get a new xid. . Record parent xid in
pg_subtrans[xid].. pg_clog[xid] is still 00.
 

ROLLBACK subtransaction: . Set pg_clog[xid] to 10 (aborted). . Optionally set clog bits for subsubtransactions to 10
(*).. Pop transaction info from stack.
 

COMMIT subtransaction: . Set pg_clog[xid] to 11 (committed subtrans). . Don't touch clog bits for subsubtransactions! .
Poptransaction info from stack.
 

ROLLBACK main transaction: . Set pg_clog[xid] to 10 (aborted). . Optionally set clog bits for subtransactions to 10
(*).

COMMIT main transaction: . Set pg_clog[xid] to 01 (committed). . Don't touch clog bits for subtransactions!

(*) This can only be done if there is a transaction local data
structure holding information about all subtransactions or we have a
subtrans tree implementation that is navigable from parent to child.
Anyway, whether this is done or not does not influence the consistency
of this proposal.


Tuple visibility
----------------

Visibility check by other transactions:  If a tuple is visited and its
XMIN/XMAX_IS_COMMITTED/ABORTED flags are not yet set, pg_clog has to
be consulted to find out the status of the inserting/deleting
transaction xid.  If pg_clog[xid] is ...
00:  transaction still active
10:  aborted
01:  committed
11:  committed subtransaction, have to check parent

Only in this last case do we have to get parentxid from pg_subtrans.
Now we look at pg_clog[parentxid].  If we find ...
00:  parent still active, so xid is considered active, too
10:  parent aborted, so xid is considered aborted
01:  parent committed, so xid is considered committed
11:  recursively check grandparent(s) ...

If a visibility check sees a transaction as committed, it checks
whether the main-transaction xid is visible in the current snapshot.
Fortunately we get the main-transaction xid as a by-product of
looking up the parentxid states.

If we set XMIN/MAX_IS_COMMITTED in a tuple header, we have to replace
a sub-transaction xid in xmin or xmax respectively with the
main-transaction xid at the same time.  Otherwise we'd have to look
for the main xid, whenever a tuple is touched.

XXX "at the same time" may need a bit more thought regarding locking.


HeapTupleSatifiesUpdate
-----------------------

If an UPDATE is about to change a tuple touched by another active
transaction, it waits for the other transaction to commit or abort.
We must always wait for the main transaction, not the subtrans.


Shortcuts
---------

Under certain conditions the 1/1 state can be replaced with 0/1 or
1/0.  This could save a lot of parent lookups.

Shortcut 1:  As soon as a transaction is rolled back all its
subtransactions are considered aborted, too.  So their pg_clog bits
can be set to 10 and there is no need to look up their parent any
more.  This can be done . immediately on ROLLBACK (if the implementation has a list of   committed subtransactions
handy),. as a side effect of a later visibility check, . by VACUUM.
 

Shortcut 2:  If a committed subtransaction has only committed
(grand)parents up to the committed main transaction and the main
transaction is visible to all running transactions, i.e. the main
transaction's xid is before RecentGlobalXmin (*), then the
subtransaction's pg_clog bits can be set to 01 and further parent xid
lookups are not needed any more.  This can be done . as a side effect of a later visibility check, . by VACUUM.

(*) AFAICS the statements "main transaction's xid is before
RecentGlobalXmin" and "subtransaction's xid is before
RecentGlobalXmin" are equivalent.


The point here is: pg_subtrans is only queried, if we find 11 in
pg_clog;  we try to replace 11 a soon as possible,  but only if it is
safe to do so.  Status 11 is short-lived, it is replaced with
the final status by one or more of
- ROLLBACK of the parent transaction- a later visibility check (as a side effect)- VACUUM

pg_subtrans cleanup:  A pg_subtrans_NNNN file covers a known range of
transaction ids.  As soon as none of these transactions has a pg_clog
status of 11, the pg_subtrans_NNNN file can be removed.  VACUUM can do
this, and it won't even have to check the heap.


Still missing
-------------

. Visibility checks for tuples inserted/deleted by a (sub)transaction
belonging to the current transaction tree (have to check local
transaction stack whenever we look at a xid or switch to a parent xid)

. WAL


Objections and suggestions
--------------------------

Tom:
>Also, using a 11 state doubles the amount of pg_clog I/O needed to
>commit a collection of subtransactions.  You have to write 11 as the
>state of each commitable subtransaction, then commit the parent (write
>01 as its state), then go back and change the state of each
>subtransaction to 01.  (Whether this last bit is done as part of parent
>transaction commit, or during later inspections of the state of the
>subtransaction, doesn't change the argument.)

Yes, there may be more pg_clog I/O.  But I don't think that it is
doubled.  I'd expect some locality; after all one page holds the
status bits for 32K transactions.

Tom:
>I think it would be preferable to use only three states: active,
>aborted, committed.  The parent commit protocol is (1) write 10 as state
>of each aborted subtransaction (this should be done as soon as the
>subtransaction is known aborted, rather than delaying to parent commit);
>(2) write 01 as state of parent (this is the atomic commit); (3) write
>01 as state of each committed subtransaction.  Readers who see 00 must
>check the parent state; if the parent is committed then they have to go
>back and recheck the child state (to see if it became "aborted" after
>they looked).  This halves the write traffic during a commit, at the
>cost of additional read traffic when subtransaction state is checked in
>a narrow window after the time of parent transaction commit.  I believe
>it nets out to be faster.

Pro:  Saves the fourth state for something different.

Con:  Does a pg_subtrans lookup even for main transactions (at least
until xid < RecentGlobalXmin)

Con:  Needs subtrans tree navigation from parent to child.

It's noticeably past midnight as I write this ... will reconsider
that approach after I had same sleep.

Tom:
>But wait a moment.  The child xid is by definition always greater than
>(newer than) its parent.  So if we consult pg_clog and find the
>transaction marked committed, *and* the xid is before the window of XIDs
>in our snapshot, then even if it's not a top-level xid, the parent must
>be before our window too.  Therefore we can conclude the transaction is
>visible in our snapshot.  So indeed there is a good-size range of xids
>for which we'll never need to chase the parent link: everything before
>the RecentGlobalXmin computed by GetSnapshotData.

I think this has been integrated into this proposal (cf. shortcut 2).

>  (We do have to set
>subtransactions to committed during parent commit to make this true;
>we can't update them lazily.  But I think that's okay.)

This is not clear to me ... I'll try again later ...

------------------------------

That's all for now, folks.  Fell free to comment.

Sorry for the long post.  Would you prefer such kind of stuff on a web
page and just a short note with the URL to the list?

ServusManfred

Re: Nested transactions: low level stuff

From

Tom Lane

Date:

19 March 2003, 11:18:43

Manfred Koizar <mkoi-pg@aon.at> writes:
> If we set XMIN/MAX_IS_COMMITTED in a tuple header, we have to replace
> a sub-transaction xid in xmin or xmax respectively with the
> main-transaction xid at the same time.  Otherwise we'd have to look
> for the main xid, whenever a tuple is touched.

This worries me --- it changes a safe operation (OR'ing in a commit bit)
into an unsafe one that requires exclusive lock on the page containing
the tuple.  I'm also concerned that we'd now need a WAL entry to record
the xid change (are we dependent on this change occurring for correctness?
or is it only performance?)

Perhaps it would be better to leave the tuple-commit bit unset until we
have been able to change the clog state to 01 ("committed to everyone").

> Tom:
>> I think it would be preferable to use only three states: active,
>> aborted, committed.

> Con:  Needs subtrans tree navigation from parent to child.

But only in the backend owning the transaction; there's no need for
shared state that allows it.

> Sorry for the long post.  Would you prefer such kind of stuff on a web
> page and just a short note with the URL to the list?

No.  This way it gets into the list archives.
        regards, tom lane

Re: Nested transactions: low level stuff

From

Manfred Koizar

Date:

19 March 2003, 12:24:54

On Wed, 19 Mar 2003 11:18:38 -0500, Tom Lane <tgl@sss.pgh.pa.us>
wrote:
>Manfred Koizar <mkoi-pg@aon.at> writes:
>> If we set XMIN/MAX_IS_COMMITTED in a tuple header, we have to replace
>> a sub-transaction xid in xmin or xmax respectively with the
>> main-transaction xid at the same time.  Otherwise we'd have to look
>> for the main xid, whenever a tuple is touched.
>
>This worries me --- it changes a safe operation (OR'ing in a commit bit)
>into an unsafe one that requires exclusive lock on the page containing
>the tuple.

[Only talking about xmin here, but everything refers to xmax as well.]
I was hoping we could set xmin atomically without holding a lock.  If
we can, we first set xmin to the main xid.  The new state is still
consistent; now it looks as if the change has been made directly by
the main transaction and not by one of its subtransactions, which is
ok after the main transaction has committed (we are only talking about
cases where all interesting transactions have committed).  As a second
step we update the commit bit which is as safe as it is now.

I see no concurrency problems.  If two or more backends visit the same
tuple, they either write the same value to the same position which
doesn't hurt, or one sees the other's changes which is a good thing.

So this boils down to whether setting the value of a properly aligned
32 bit variable in shared memory is an atomic operation on all
supported platforms.  I don't know enough about compilers to answer
this question.

>I'm also concerned that we'd now need a WAL entry to record
>the xid change

If the sequence is "first update xmin, then set the commit bit", we
never have an inconsistent state.  And if the change is lost, it can
be redone by the next backend visiting the tuple.  So I think we don't
need a WAL entry.

> (are we dependent on this change occurring for correctness?
>or is it only performance?)

The latter.

>Perhaps it would be better to leave the tuple-commit bit unset until we
>have been able to change the clog state to 01 ("committed to everyone").

At least we can fall back to this, if we can't find out how to update
the xid safely.

ServusManfred

Re: Nested transactions: low level stuff

From

Tom Lane

Date:

19 March 2003, 13:00:17

Manfred Koizar <mkoi-pg@aon.at> writes:
> If the sequence is "first update xmin, then set the commit bit", we
> never have an inconsistent state.  And if the change is lost, it can
> be redone by the next backend visiting the tuple.

Not if the subtransaction log state has been removed as no longer
needed.  I think a WAL entry will be essential.  (An alternative
might be to keep subtransaction state as long as we keep pg_clog
state, but that's pretty unpleasant too.)

I think we'd be a lot better off to design this so that we don't need to
alter heap tuple xmin values...
        regards, tom lane

Re: Nested transactions: low level stuff

From

"Mikheev, Vadim"

Date:

19 March 2003, 13:25:34

> I see no concurrency problems.  If two or more backends visit the same
> tuple, they either write the same value to the same position which
> doesn't hurt, or one sees the other's changes which is a good thing.

AFAIR, on multi-CPU platforms it's possible that second transaction could
see COMMITTED state but still old (subtrans id) in xmin: it's not
guaranteed that changes made on CPU1 (V1 was changed first, then V2 was
changed) will appear at the same order on CPU2 (V2 may come first, then V1).

Vadim


_____________________________________________________
Revere Data, LLC, formerly known as Sector Data, LLC, is not affiliated with
Sector, Inc., or SIAC.

Re: Nested transactions: low level stuff

From

Manfred Koizar

Date:

19 March 2003, 14:37:02

On Wed, 19 Mar 2003 13:00:07 -0500, Tom Lane <tgl@sss.pgh.pa.us>
wrote:
>Manfred Koizar <mkoi-pg@aon.at> writes:
>> And if the change is lost, it can
>> be redone by the next backend visiting the tuple.
>
>Not if the subtransaction log state has been removed as no longer
>needed.

But this problem is not triggered by a tuple that has its xmin changed
by a visitor and then looses that change again.  We'd have the same
problems with tuples that have never been visited (*).  So we must
make sure that pg_subtrans segments are not discarded as long as they
are needed.  

(*) I guess your argument is:  VACUUM makes sure that all tuples have
been visited before it discards pg_subtrans segments.

With my 4-state-proposal VACUUM can decide whether a pg_subtrans
segment is still needed by only looking at pg_clog.

>  I think a WAL entry will be essential.

I'm still in doubt, but it's moot (see below).

>I think we'd be a lot better off to design this so that we don't need to
>alter heap tuple xmin values...

If Vadim remembers correctly we cannot safely change xmin, unless we
want to grab a write lock.  Ok, we'll not change xmin and we'll not
set the commit bit before xmin is visible to all if xmin is a
subtransaction.  We can always add this performance hack later, if
someone finds a safe implementation ...

ServusManfred

Re: Nested transactions: low level stuff

From

Hiroshi Inoue

Date:

19 March 2003, 19:03:07

Sorry I have a basic question.
Was there any consensus we would introduce nested transactions
(or savepoints) in the way currently discussed ?

regards,
Hiroshi Inoue

Manfred Koizar wrote:
> 
> On Wed, 19 Mar 2003 13:00:07 -0500, Tom Lane <tgl@sss.pgh.pa.us>
> wrote:
> >Manfred Koizar <mkoi-pg@aon.at> writes:
> >> And if the change is lost, it can
> >> be redone by the next backend visiting the tuple.
> >
> >Not if the subtransaction log state has been removed as no longer
> >needed.
> 
> But this problem is not triggered by a tuple that has its xmin changed
> by a visitor and then looses that change again.  We'd have the same
> problems with tuples that have never been visited (*).  So we must
> make sure that pg_subtrans segments are not discarded as long as they
> are needed.
> 
> (*) I guess your argument is:  VACUUM makes sure that all tuples have
> been visited before it discards pg_subtrans segments.
> 
> With my 4-state-proposal VACUUM can decide whether a pg_subtrans
> segment is still needed by only looking at pg_clog.
> 
> >  I think a WAL entry will be essential.
> 
> I'm still in doubt, but it's moot (see below).
> 
> >I think we'd be a lot better off to design this so that we don't need to
> >alter heap tuple xmin values...
> 
> If Vadim remembers correctly we cannot safely change xmin, unless we
> want to grab a write lock.  Ok, we'll not change xmin and we'll not
> set the commit bit before xmin is visible to all if xmin is a
> subtransaction.  We can always add this performance hack later, if
> someone finds a safe implementation ...
> 
> Servus
>  Manfred
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

-- 
Hiroshi Inouehttp://www.geocities.jp/inocchichichi/psqlodbc/

Re: Nested transactions: low level stuff

From

Tom Lane

Date:

19 March 2003, 21:38:20

Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> Sorry I have a basic question.
> Was there any consensus we would introduce nested transactions
> (or savepoints) in the way currently discussed ?

I think we are a long way from saying we can or will actually do it.
Error handling and resource management (eg locks) are a couple of other
huge cans of worms that have yet to be opened.  But certainly a solid
design for the transaction logging and tuple validity checking is a
necessary step.

My feeling is that the right way to proceed is to nail down a paper
design for each of the major aspects of the problem, before anyone
actually spends any time coding.  There would be little point in
implementing subtransaction logging if we don't know how to do the
other things.
        regards, tom lane

Re: Nested transactions: low level stuff

From

Hiroshi Inoue

Date:

19 March 2003, 22:04:49

Tom Lane wrote:
> 
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > Sorry I have a basic question.
> > Was there any consensus we would introduce nested transactions
> > (or savepoints) in the way currently discussed ?
> 
> I think we are a long way from saying we can or will actually do it.
> Error handling and resource management (eg locks) are a couple of other
> huge cans of worms that have yet to be opened.  But certainly a solid
> design for the transaction logging and tuple validity checking is a
> necessary step.

Is the way to undo data rejected already ?

regards,
Hiroshi Inouehttp://www.geocities.jp/inocchichichi/psqlodbc/

Re: Nested transactions: low level stuff

From

Bruce Momjian

Date:

19 March 2003, 22:14:32

Hiroshi Inoue wrote:
> Tom Lane wrote:
> > 
> > Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > > Sorry I have a basic question.
> > > Was there any consensus we would introduce nested transactions
> > > (or savepoints) in the way currently discussed ?
> > 
> > I think we are a long way from saying we can or will actually do it.
> > Error handling and resource management (eg locks) are a couple of other
> > huge cans of worms that have yet to be opened.  But certainly a solid
> > design for the transaction logging and tuple validity checking is a
> > necessary step.
> 
> Is the way to undo data rejected already ?

You mean abort subtransactions?  Each subtransaction gets its own
transaction id, so we just mark that as aborted --- there is no undo of
tuples, though I had originally suggested that approach years ago.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Nested transactions: low level stuff

From

Hiroshi Inoue

Date:

19 March 2003, 22:17:59

Bruce Momjian wrote:
> 
> Hiroshi Inoue wrote:
> > Tom Lane wrote:
> > >
> > > Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > > > Sorry I have a basic question.
> > > > Was there any consensus we would introduce nested transactions
> > > > (or savepoints) in the way currently discussed ?
> > >
> > > I think we are a long way from saying we can or will actually do it.
> > > Error handling and resource management (eg locks) are a couple of other
> > > huge cans of worms that have yet to be opened.  But certainly a solid
> > > design for the transaction logging and tuple validity checking is a
> > > necessary step.
> >
> > Is the way to undo data rejected already ?
> 
> You mean abort subtransactions?  Each subtransaction gets its own
> transaction id, so we just mark that as aborted --- there is no undo of
> tuples, though I had originally suggested that approach years ago.

Vadim planned to implement the savepoints functionality
using UNDO mechanism. AFAIR it was never denied explicitly.

regards,
Hiroshi Inouehttp://www.geocities.jp/inocchichichi/psqlodbc/

Re: Nested transactions: low level stuff

From

Bruce Momjian

Date:

19 March 2003, 22:30:18

Hiroshi Inoue wrote:
> > > > I think we are a long way from saying we can or will actually do it.
> > > > Error handling and resource management (eg locks) are a couple of other
> > > > huge cans of worms that have yet to be opened.  But certainly a solid
> > > > design for the transaction logging and tuple validity checking is a
> > > > necessary step.
> > >
> > > Is the way to undo data rejected already ?
> > 
> > You mean abort subtransactions?  Each subtransaction gets its own
> > transaction id, so we just mark that as aborted --- there is no undo of
> > tuples, though I had originally suggested that approach years ago.
> 
> Vadim planned to implement the savepoints functionality
> using UNDO mechanism. AFAIR it was never denied explicitly.

If you go to the TODO.detail/transactions archive, there was discussion
of using UNDO, and most felt that there were too many problems of having
to manage the undo system, and that assigning a separate transaction id
to every subtransaction was cleaner and more closely matched our
existing system.  It also has zero time for undo, which is the case we
have for main transactions now.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Nested transactions: low level stuff

From

Tom Lane

Date:

19 March 2003, 22:35:31

Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> Bruce Momjian wrote:
>> You mean abort subtransactions?  Each subtransaction gets its own
>> transaction id, so we just mark that as aborted --- there is no undo of
>> tuples, though I had originally suggested that approach years ago.

> Vadim planned to implement the savepoints functionality
> using UNDO mechanism. AFAIR it was never denied explicitly.

Given all the flak we got about WAL growth during the time we had that
code enabled, I think there's no chance that UNDO will be the preferred
path.  It's not workable with big transactions.

There are other problems besides WAL bloat, too.  I realized while I was
working on the btree code a few weeks ago that it's fundamentally
unfriendly to UNDO, because there are some operations you'd want to
UNDO (viz, insertion of a leaf item pointing at a heap tuple) and some
you would not (viz, splitting of index pages and subsequent insertion of
items into upper tree levels).  But the same WAL entry might include
both kinds of operation.  This could be got round, perhaps, but that
code is overcomplicated already ...
        regards, tom lane

Re: Nested transactions: low level stuff

From

Bruce Momjian

Date:

19 March 2003, 22:42:30

Tom Lane wrote:
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > Bruce Momjian wrote:
> >> You mean abort subtransactions?  Each subtransaction gets its own
> >> transaction id, so we just mark that as aborted --- there is no undo of
> >> tuples, though I had originally suggested that approach years ago.
> 
> > Vadim planned to implement the savepoints functionality
> > using UNDO mechanism. AFAIR it was never denied explicitly.
> 
> Given all the flak we got about WAL growth during the time we had that
> code enabled, I think there's no chance that UNDO will be the preferred
> path.  It's not workable with big transactions.
> 
> There are other problems besides WAL bloat, too.  I realized while I was
> working on the btree code a few weeks ago that it's fundamentally
> unfriendly to UNDO, because there are some operations you'd want to
> UNDO (viz, insertion of a leaf item pointing at a heap tuple) and some
> you would not (viz, splitting of index pages and subsequent insertion of
> items into upper tree levels).  But the same WAL entry might include
> both kinds of operation.  This could be got round, perhaps, but that
> code is overcomplicated already ...

I assumed the UNDO would have had to be in a separate place, or allow
compression of the WAL file to keep needed UNDO stuff but get rid of
unneeded stuff --- it was all quite complicated.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Nested transactions: low level stuff

From

Hiroshi Inoue

Date:

19 March 2003, 22:49:04

Bruce Momjian wrote:
> 
> Hiroshi Inoue wrote:
> > > > > I think we are a long way from saying we can or will actually do it.
> > > > > Error handling and resource management (eg locks) are a couple of other
> > > > > huge cans of worms that have yet to be opened.  But certainly a solid
> > > > > design for the transaction logging and tuple validity checking is a
> > > > > necessary step.
> > > >
> > > > Is the way to undo data rejected already ?
> > >
> > > You mean abort subtransactions?  Each subtransaction gets its own
> > > transaction id, so we just mark that as aborted --- there is no undo of
> > > tuples, though I had originally suggested that approach years ago.
> >
> > Vadim planned to implement the savepoints functionality
> > using UNDO mechanism. AFAIR it was never denied explicitly.
> 
> If you go to the TODO.detail/transactions archive, there was discussion
> of using UNDO, and most felt that there were too many problems of having
> to manage the undo system,

This is closely related to the basics of PostgreSQL.
Pleas don't decide it implicitly.

regards,
Hiroshi Inouehttp://www.geocities.jp/inocchichichi/psqlodbc/

Re: Nested transactions: low level stuff

From

Bruce Momjian

Date:

19 March 2003, 22:52:05

Hiroshi Inoue wrote:
> > > Vadim planned to implement the savepoints functionality
> > > using UNDO mechanism. AFAIR it was never denied explicitly.
> > 
> > If you go to the TODO.detail/transactions archive, there was discussion
> > of using UNDO, and most felt that there were too many problems of having
> > to manage the undo system,
> 
> This is closely related to the basics of PostgreSQL.
> Pleas don't decide it implicitly.

We took a vote and UNDO lost --- do you want to do another vote?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Nested transactions: low level stuff

From

Tom Lane

Date:

19 March 2003, 22:54:18

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> There are other problems besides WAL bloat, too.

> I assumed the UNDO would have had to be in a separate place, or allow
> compression of the WAL file to keep needed UNDO stuff but get rid of
> unneeded stuff --- it was all quite complicated.

There is a mechanism in the XLOG code to distinguish "in transaction"
from "out of transaction" WAL entries.  The thing I had not realized
before working on the btree code is that that mechanism is inadequate
for UNDO.
        regards, tom lane

Re: Nested transactions: low level stuff

From

Hiroshi Inoue

Date:

19 March 2003, 22:59:45

Bruce Momjian wrote:
> 
> Hiroshi Inoue wrote:
> > > > Vadim planned to implement the savepoints functionality
> > > > using UNDO mechanism. AFAIR it was never denied explicitly.
> > >
> > > If you go to the TODO.detail/transactions archive, there was discussion
> > > of using UNDO, and most felt that there were too many problems of having
> > > to manage the undo system,
> >
> > This is closely related to the basics of PostgreSQL.
> > Pleas don't decide it implicitly.
> 
> We took a vote and UNDO lost --- do you want to do another vote?

Sorry I missed the vote. Where is it ?

regards,
Hiroshi Inouehttp://www.geocities.jp/inocchichichi/psqlodbc/

Re: Nested transactions: low level stuff

From

"Vadim Mikheev"

Date:

19 March 2003, 23:27:01

> Given all the flak we got about WAL growth during the time we had that
> code enabled, I think there's no chance that UNDO will be the preferred
> path.  It's not workable with big transactions.

Somehow it's working in other DB systems.

> There are other problems besides WAL bloat, too.  I realized while I was
> working on the btree code a few weeks ago that it's fundamentally
> unfriendly to UNDO, because there are some operations you'd want to
> UNDO (viz, insertion of a leaf item pointing at a heap tuple) and some
> you would not (viz, splitting of index pages and subsequent insertion of
> items into upper tree levels).  But the same WAL entry might include
> both kinds of operation.  This could be got round, perhaps, but that
> code is overcomplicated already ...

Each access-method requires specific UNDO code (like REDO).
Once again, it works in other DB-es.

Vadim

Re: Nested transactions: low level stuff

From

Bruce Momjian

Date:

19 March 2003, 23:31:02

Hiroshi Inoue wrote:
> Bruce Momjian wrote:
> > 
> > Hiroshi Inoue wrote:
> > > > > Vadim planned to implement the savepoints functionality
> > > > > using UNDO mechanism. AFAIR it was never denied explicitly.
> > > >
> > > > If you go to the TODO.detail/transactions archive, there was discussion
> > > > of using UNDO, and most felt that there were too many problems of having
> > > > to manage the undo system,
> > >
> > > This is closely related to the basics of PostgreSQL.
> > > Pleas don't decide it implicitly.
> > 
> > We took a vote and UNDO lost --- do you want to do another vote?
> 
> Sorry I missed the vote. Where is it ?

I can't find the vote in the archive.  As I remember, Vadim and a few
others liked UNDO, while more liked the current approach.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Nested transactions: low level stuff

From

Tom Lane

Date:

19 March 2003, 23:53:53

"Vadim Mikheev" <vmikheev@reveredata.com> writes:
>> Given all the flak we got about WAL growth during the time we had that
>> code enabled, I think there's no chance that UNDO will be the preferred
>> path.  It's not workable with big transactions.

> Somehow it's working in other DB systems.

Isn't limited UNDO segment size one of the most-hated management
problems for Oracle databases?  I don't see why we should want to
duplicate one of their worst problems.
        regards, tom lane

Re: Nested transactions: low level stuff

From

"Vadim Mikheev"

Date:

20 March 2003, 00:05:04

> >> Given all the flak we got about WAL growth during the time we had that
> >> code enabled, I think there's no chance that UNDO will be the preferred
> >> path.  It's not workable with big transactions.
>
> > Somehow it's working in other DB systems.
>
> Isn't limited UNDO segment size one of the most-hated management
> problems for Oracle databases?  I don't see why we should want to
> duplicate one of their worst problems.

How is it different from disk-space appetite of our non-overwriting smgr?!
Before transaction commits you have to keep old data somewhere anyway.
Let's not limit size of UNDO segments and that's it.

Vadim

Re: Nested transactions: low level stuff

From

Alvaro Herrera

Date:

20 March 2003, 01:24:41

On Wed, Mar 19, 2003 at 09:38:21PM -0500, Tom Lane wrote:
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > Sorry I have a basic question.
> > Was there any consensus we would introduce nested transactions
> > (or savepoints) in the way currently discussed ?
> 
> I think we are a long way from saying we can or will actually do it.
> Error handling and resource management (eg locks) are a couple of other
> huge cans of worms that have yet to be opened.

Why do you say lock management is a can of worms?  Locks are released at
xact end by means of LockReleaseAll(), and this doesn't release all the
locks the backend holds: only the locks that have the same xid that the
committing transaction has (the mechanism has to be corrected for xact
abort though).

This is exactly what is needed for nested transactions: the ending
subtransaction should release the locks it has, but the parent
transaction should keep the ones it had.

What's more, there's provision in LockAcquire() (storage/lmgr/lock.c
line 602) for making possible that a backend holds a lock more than once
in different XIDs.

I know I'm missing something, but what is it?

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"No es bueno caminar con un hombre muerto"

Re: Nested transactions: low level stuff

From

Bruce Momjian

Date:

20 March 2003, 01:28:28

Tom mentioned to me that if you deadlock in a subtransaction, you have
to release the locks of the subtransaction.   That could be tricky.

---------------------------------------------------------------------------

Alvaro Herrera wrote:
> On Wed, Mar 19, 2003 at 09:38:21PM -0500, Tom Lane wrote:
> > Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > > Sorry I have a basic question.
> > > Was there any consensus we would introduce nested transactions
> > > (or savepoints) in the way currently discussed ?
> > 
> > I think we are a long way from saying we can or will actually do it.
> > Error handling and resource management (eg locks) are a couple of other
> > huge cans of worms that have yet to be opened.
> 
> Why do you say lock management is a can of worms?  Locks are released at
> xact end by means of LockReleaseAll(), and this doesn't release all the
> locks the backend holds: only the locks that have the same xid that the
> committing transaction has (the mechanism has to be corrected for xact
> abort though).
> 
> This is exactly what is needed for nested transactions: the ending
> subtransaction should release the locks it has, but the parent
> transaction should keep the ones it had.
> 
> What's more, there's provision in LockAcquire() (storage/lmgr/lock.c
> line 602) for making possible that a backend holds a lock more than once
> in different XIDs.
> 
> I know I'm missing something, but what is it?
> 
> -- 
> Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
> "No es bueno caminar con un hombre muerto"
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Nested transactions: low level stuff

From

Tom Lane

Date:

20 March 2003, 01:40:54

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> Why do you say lock management is a can of worms?  Locks are released at
> xact end by means of LockReleaseAll(), and this doesn't release all the
> locks the backend holds: only the locks that have the same xid that the
> committing transaction has (the mechanism has to be corrected for xact
> abort though).

Well, I'm not sure about this.  Is it okay to release a child xact's
locks when the child commits?  Usually the point of holding a lock till
commit is to ensure that other xacts will see your change as committed
when they get in.  But they won't see the child's work as committed if
its parent is still open.  On the other hand, an aborted child's locks
had better get released (consider case where child is aborting to get
out of a deadlock condition).  This needs careful thought.

There are indeed some first-cut provisions in the lock code for multiple
transactions owned by a backend, but it'd be dangerous to assume that
they are either correct or complete.  The only case that's tested is for
VACUUM to hold a lock across two transactions --- and this lock will not
be held in the face of an error, so it's not an accurate representation
of nested xacts anyway.

Also see LW locks, which have no such management infrastructure ...
        regards, tom lane

Re: Nested transactions: low level stuff

From

Alvaro Herrera

Date:

20 March 2003, 13:02:17

On Thu, Mar 20, 2003 at 01:40:44AM -0500, Tom Lane wrote:

> There are indeed some first-cut provisions in the lock code for multiple
> transactions owned by a backend, but it'd be dangerous to assume that
> they are either correct or complete.  The only case that's tested is for
> VACUUM to hold a lock across two transactions --- and this lock will not
> be held in the face of an error, so it's not an accurate representation
> of nested xacts anyway.

Well, the only way to see if they are right or wrong is testing them.  I
will be trying to completely understand the transaction/block states so
I can implement the needed state machinery for nested transaction; with
this we can play with locks and the rest of resources.

I think this transaction state game is the easiest part of nested
transactions.

> Also see LW locks, which have no such management infrastructure ...

You can't end a transaction holding one of those; failure to do so is a
programming error.  The only way it is allowed is when elog(ERROR) is
called.  For that I propose that held_lwlocks is replaced with

typedef struct held_lwlocks
{TransactionId    xid[MAX_SIMUL_LWLOCKS];LWLockId    lid[MAX_SIMUL_LWLOCKS];int        num_locks_held;
} held_lwlocks;

and LWReleaseAll() modified appropiately.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Nunca confiaré en un traidor.  Ni siquiera si el traidor lo he creado yo"
(Barón Vladimir Harkonnen)

Re: Nested transactions: low level stuff

From

Hiroshi Inoue

Date:

20 March 2003, 22:51:49

Bruce Momjian wrote:
> 
> Hiroshi Inoue wrote:
> > Bruce Momjian wrote:
> > >
> > > Hiroshi Inoue wrote:
> > > > > > Vadim planned to implement the savepoints functionality
> > > > > > using UNDO mechanism. AFAIR it was never denied explicitly.
> > > > >
> > > > > If you go to the TODO.detail/transactions archive, there was discussion
> > > > > of using UNDO, and most felt that there were too many problems of having
> > > > > to manage the undo system,
> > > >
> > > > This is closely related to the basics of PostgreSQL.
> > > > Pleas don't decide it implicitly.
> > >
> > > We took a vote and UNDO lost --- do you want to do another vote?
> >
> > Sorry I missed the vote. Where is it ?
> 
> I can't find the vote in the archive.  As I remember, Vadim and a few
> others liked UNDO, while more liked the current approach.

As far as I remember there was no such vote or decision.
Note that I'm not particularly on UNDO side but I don't
think that the currently discussed way is much better
than UNDO. Please make the advantage/disadvantages clear
and let me understand the meaning of this thread.

regards,
Hiroshi Inouehttp://www.geocities.jp/inocchichichi/psqlodbc/

Re: Nested transactions: low level stuff

From

Bruce Momjian

Date:

20 March 2003, 23:00:04

If there was no official vote, the conclusion came from the discussion
that almost everyone wanted subtransactions without UNDO.

I don't want to rehash it.  If you want a vote, let's vote.

Who wants subtransactions with UNDO and who wants it with a separate
transaction id for every subtransaction?

---------------------------------------------------------------------------

Hiroshi Inoue wrote:
> Bruce Momjian wrote:
> > 
> > Hiroshi Inoue wrote:
> > > Bruce Momjian wrote:
> > > >
> > > > Hiroshi Inoue wrote:
> > > > > > > Vadim planned to implement the savepoints functionality
> > > > > > > using UNDO mechanism. AFAIR it was never denied explicitly.
> > > > > >
> > > > > > If you go to the TODO.detail/transactions archive, there was discussion
> > > > > > of using UNDO, and most felt that there were too many problems of having
> > > > > > to manage the undo system,
> > > > >
> > > > > This is closely related to the basics of PostgreSQL.
> > > > > Pleas don't decide it implicitly.
> > > >
> > > > We took a vote and UNDO lost --- do you want to do another vote?
> > >
> > > Sorry I missed the vote. Where is it ?
> > 
> > I can't find the vote in the archive.  As I remember, Vadim and a few
> > others liked UNDO, while more liked the current approach.
> 
> As far as I remember there was no such vote or decision.
> Note that I'm not particularly on UNDO side but I don't
> think that the currently discussed way is much better
> than UNDO. Please make the advantage/disadvantages clear
> and let me understand the meaning of this thread.
> 
> regards,
> Hiroshi Inoue
>     http://www.geocities.jp/inocchichichi/psqlodbc/
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Nested transactions: low level stuff

From

Bruce Momjian

Date:

20 March 2003, 23:26:51

I found the discussion, all 146 messages:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=200105231527.f4NFRvG07790%40candle.pha.pa.us&rnum=1&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DISO-8859-1%26q%3Dvote%2Bsubtransaction%26btnG%3DGoogle%2BSearch%26meta%3Dgroup%253Dcomp.databases.postgresql.*

The discussion was more general and dealt with vacuum and undo.

The basic discussion for subtransactions is whether you want to add UNDO
logic just to do subtransactions, or whether it is easier to deal with
multiple transaction ids per backend.  Most thought the the second
option was much easier.

In fact, I had proposed a simpler UNDO capability that revisited tuples
and set their XID to a fixed aborted XID to clean up aborted
subtransactions, but most now like the multiple XID solution.

---------------------------------------------------------------------------

Bruce Momjian wrote:
> 
> If there was no official vote, the conclusion came from the discussion
> that almost everyone wanted subtransactions without UNDO.
> 
> I don't want to rehash it.  If you want a vote, let's vote.
> 
> Who wants subtransactions with UNDO and who wants it with a separate
> transaction id for every subtransaction?
> 
> ---------------------------------------------------------------------------
> 
> Hiroshi Inoue wrote:
> > Bruce Momjian wrote:
> > > 
> > > Hiroshi Inoue wrote:
> > > > Bruce Momjian wrote:
> > > > >
> > > > > Hiroshi Inoue wrote:
> > > > > > > > Vadim planned to implement the savepoints functionality
> > > > > > > > using UNDO mechanism. AFAIR it was never denied explicitly.
> > > > > > >
> > > > > > > If you go to the TODO.detail/transactions archive, there was discussion
> > > > > > > of using UNDO, and most felt that there were too many problems of having
> > > > > > > to manage the undo system,
> > > > > >
> > > > > > This is closely related to the basics of PostgreSQL.
> > > > > > Pleas don't decide it implicitly.
> > > > >
> > > > > We took a vote and UNDO lost --- do you want to do another vote?
> > > >
> > > > Sorry I missed the vote. Where is it ?
> > > 
> > > I can't find the vote in the archive.  As I remember, Vadim and a few
> > > others liked UNDO, while more liked the current approach.
> > 
> > As far as I remember there was no such vote or decision.
> > Note that I'm not particularly on UNDO side but I don't
> > think that the currently discussed way is much better
> > than UNDO. Please make the advantage/disadvantages clear
> > and let me understand the meaning of this thread.
> > 
> > regards,
> > Hiroshi Inoue
> >     http://www.geocities.jp/inocchichichi/psqlodbc/
> > 
> 
> -- 
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/docs/faqs/FAQ.html
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Nested transactions: low level stuff

From

"Vadim Mikheev"

Date:

21 March 2003, 02:34:49

> If there was no official vote, the conclusion came from the discussion
> that almost everyone wanted subtransactions without UNDO.
>
> I don't want to rehash it.  If you want a vote, let's vote.
> 
> Who wants subtransactions with UNDO and who wants it with a separate
> transaction id for every subtransaction?

Don't mess up things, Bruce - UNDO is not for subtransactions only!
UNDO would allow immediate storage cleanup and vacuum would
not be required anymore. Subtransactions/savepoints would be just
"by-effect" of UNDO. (And, btw, how would you implement "implicit"
savepoints with "separate subtrans id" approach?)

But do we need any voting, actually? Is there anybody who want/ready
implement UNDO functionality? No? Then there is nothing to vote about.
(Though I personally consider "subtrans id-s" as "messing up messy
transaction system". Messing up is always easier then re-designing).

Vadim

Re: Nested transactions: low level stuff

From

Bruce Momjian

Date:

21 March 2003, 10:18:33

Vadim Mikheev wrote:
> > If there was no official vote, the conclusion came from the discussion
> > that almost everyone wanted subtransactions without UNDO.
> >
> > I don't want to rehash it.  If you want a vote, let's vote.
> > 
> > Who wants subtransactions with UNDO and who wants it with a separate
> > transaction id for every subtransaction?
> 
> Don't mess up things, Bruce - UNDO is not for subtransactions only!
> UNDO would allow immediate storage cleanup and vacuum would
> not be required anymore. Subtransactions/savepoints would be just
> "by-effect" of UNDO. (And, btw, how would you implement "implicit"
> savepoints with "separate subtrans id" approach?)
> 
> But do we need any voting, actually? Is there anybody who want/ready
> implement UNDO functionality? No? Then there is nothing to vote about.
> (Though I personally consider "subtrans id-s" as "messing up messy
> transaction system". Messing up is always easier then re-designing).

Yes, Vadim is right.  The UNDO was much more than subtransactions, but
actually a discussion comparing UNDO and the free-space map approach.

It would help with subtransactions, but it is only tangentially related.

I think the issue was:
Do we want UNDO or FSM?

We chose FSM but UNDO is still an option.
Do we want UNDO just for subtransactions?

That was pretty easily defeated, though I made an argument that you
could do UNDO pretty cheaply when you have WAL ensuring crash recovery.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Nested transactions: low level stuff

From

Tom Lane

Date:

21 March 2003, 10:41:55

Bruce Momjian <pgman@candle.pha.pa.us> writes:
>     Do we want UNDO just for subtransactions?
> That was pretty easily defeated, though I made an argument that you
> could do UNDO pretty cheaply when you have WAL ensuring crash recovery.

That argument was what got us into the early-7.1 WAL bloat problems.
I don't think it's "pretty cheap" to have to hold the entire WAL for the
length of your longest-running transactions.
        regards, tom lane

Re: Nested transactions: low level stuff

From

Bruce Momjian

Date:

21 March 2003, 10:53:16

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >     Do we want UNDO just for subtransactions?
> > That was pretty easily defeated, though I made an argument that you
> > could do UNDO pretty cheaply when you have WAL ensuring crash recovery.
> 
> That argument was what got us into the early-7.1 WAL bloat problems.
> I don't think it's "pretty cheap" to have to hold the entire WAL for the
> length of your longest-running transactions.

With my idea, you wouldn't have to keep WAL around.  Each backend would
keep a list of tids or the relid (if lots of rows are changed) in local
memory and UNDO on subtransaction abort.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Nested transactions: low level stuff

From

"Mikheev, Vadim"

Date:

22 March 2003, 01:07:54

> I see no concurrency problems.  If two or more backends visit the same
> tuple, they either write the same value to the same position which
> doesn't hurt, or one sees the other's changes which is a good thing.

AFAIR, on multi-CPU platforms it's possible that second transaction could
see COMMITTED state but still old (subtrans id) in xmin: it's not
guaranteed that changes made on CPU1 (V1 was changed first, then V2 was
changed) will appear at the same order on CPU2 (V2 may come first, then V1).

Vadim


_____________________________________________________
Revere Data, LLC, formerly known as Sector Data, LLC, is not affiliated with
Sector, Inc., or SIAC.

Re: Nested transactions: low level stuff

From

"Zeugswetter Andreas SB SD"

Date:

01 April 2003, 12:32:28

> Who wants subtransactions with UNDO and who wants it with a separate
> transaction id for every subtransaction?

I think there is at least one special case, that would largely profit
from UNDO (or some other identical mechanism), namely an insert that
causes a constraint violation.

The standard way to program an "insert or update" is to do the one command
that will succeed in more cases first, then on failure do the other.
In PG this currently has to be done inefficiently by first doing the update
and then doing the insert. If we had implicit subtransactions, I think people
would start using the standard approach. It would probably not be too nice if
that would always leave dead tuples and index entries around.

Andreas

Re: Nested transactions: low level stuff

From

"Zeugswetter Andreas SB SD"

Date:

01 April 2003, 12:38:29

> In fact, I had proposed a simpler UNDO capability that revisited tuples
> and set their XID to a fixed aborted XID to clean up aborted
> subtransactions, but most now like the multiple XID solution.

I think for the implicit subtransactions that we will want
(with error codes comming) using a different xid for every command
inside a transaction is not so sexy, no ?

Andreas