Thread: nested xacts and phantom Xids

nested xacts and phantom Xids

From
Alvaro Herrera
Date:
Ok people,

Here I present the nested transactions patch and the phantom Xids patch
that goes with it.

Please review, test and comment.

Missing: rollback of SET CONSTRAINTS and GUC vars.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Y una voz del caos me habló y me dijo
"Sonríe y sé feliz, podría ser peor".
Y sonreí. Y fui feliz.
Y fue peor.

Attachment

Re: nested xacts and phantom Xids

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> Here I present the nested transactions patch and the phantom Xids patch
> that goes with it.

I looked at the phantom XIDs stuff a bit.  I still have little confidence
that the concept is correct :-( but here are some comments on the code
level.

> +  * They are marked immediately in pg_subtrans as direct childs of the topmost
> +  * Xid (no matter how deep we are in the transaction tree),

Why is that a good idea --- won't it cause problems when you
commit/abort an intermediate level of subtransaction?

> +  * All this happens when HeapTupleHeaderSetXmax is called and the Xmin of the
> +  * tuple is one of the Xids of the current transaction.
> +  *
> +  * Note that HeapTupleHeaderGetXmax/GetXmin return the masqueraded Xmin and
> +  * Xmax, not the real one in the tuple header.

I think that putting this sort of logic into Set/Get Xmin/Xmax is a
horrid idea.  They are supposed to be transparent fetch/store macros,
not arbitrarily-complex filter functions.  They're also supposed to
be reasonably fast :-(

> +  * ... We know to do this when SetXmax is called
> +  * on a tuple that has the HEAP_MARKED_FOR_UPDATE bit set.

Uglier and uglier.

> ***************
> *** 267,274 ****
>                   return true;
>
>               /* deleting subtransaction aborted */
> !             if (TransactionIdDidAbort(HeapTupleHeaderGetXmax(tuple)))
> !                 return true;
>
>               Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)));
>
> --- 268,276 ----
>                   return true;
>
>               /* deleting subtransaction aborted */
> !             if (tuple->t_infomask & HEAP_XMIN_IS_PHANTOM)
> !                 if (TransactionPhantomIsCommittedPhantom(HeapTupleHeaderGetPhantomId(tuple)))
> !                     return true;
>
>               Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)));
>

Er, what happened to checking for the "deleting subtransaction aborted"
case?

On the whole, I think this was an interesting exercise but also an utter
failure.  We'd be far better off to eat the extra 4 bytes per tuple
header and put back the separate Xmax field.  The costs in both runtime
and complexity/reliability of this patch are simply not justified by
a 4-byte savings.

            regards, tom lane

Re: nested xacts and phantom Xids

From
Bruce Momjian
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > Here I present the nested transactions patch and the phantom Xids patch
> > that goes with it.
>
> I looked at the phantom XIDs stuff a bit.  I still have little confidence
> that the concept is correct :-( but here are some comments on the code
> level.
>
> > +  * They are marked immediately in pg_subtrans as direct childs of the topmost
> > +  * Xid (no matter how deep we are in the transaction tree),
>
> Why is that a good idea --- won't it cause problems when you
> commit/abort an intermediate level of subtransaction?

I don't think so.  The phantom xid is used only by the outside
transaction waiting for that tuple to be committe or aborted.  The
outside tranaction will sleep on the topmost xid completing, then check
the phantom xid status for commit/abort. Within the transaction, I think
he uses command counter to know the creation and destruction sub-xid.

I think right now phantom xids are used only for making parts of a
subtransaction committed or aborted and only in cases where the tuple is
created and destroyed by parts of the same transaction tree.

I don't feel too bad about the runtime cost if only subtransactions are
paying that cost.  I know we are really stretching the system here but I
would like to try a little more rather than give up and taking a space
hit for all tuples.

---------------------------------------------------------------------------

> > +  * All this happens when HeapTupleHeaderSetXmax is called and the Xmin of the
> > +  * tuple is one of the Xids of the current transaction.
> > +  *
> > +  * Note that HeapTupleHeaderGetXmax/GetXmin return the masqueraded Xmin and
> > +  * Xmax, not the real one in the tuple header.
>
> I think that putting this sort of logic into Set/Get Xmin/Xmax is a
> horrid idea.  They are supposed to be transparent fetch/store macros,
> not arbitrarily-complex filter functions.  They're also supposed to
> be reasonably fast :-(
>
> > +  * ... We know to do this when SetXmax is called
> > +  * on a tuple that has the HEAP_MARKED_FOR_UPDATE bit set.
>
> Uglier and uglier.
>
> > ***************
> > *** 267,274 ****
> >                   return true;
> >
> >               /* deleting subtransaction aborted */
> > !             if (TransactionIdDidAbort(HeapTupleHeaderGetXmax(tuple)))
> > !                 return true;
> >
> >               Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)));
> >
> > --- 268,276 ----
> >                   return true;
> >
> >               /* deleting subtransaction aborted */
> > !             if (tuple->t_infomask & HEAP_XMIN_IS_PHANTOM)
> > !                 if (TransactionPhantomIsCommittedPhantom(HeapTupleHeaderGetPhantomId(tuple)))
> > !                     return true;
> >
> >               Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)));
> >
>
> Er, what happened to checking for the "deleting subtransaction aborted"
> case?
>
> On the whole, I think this was an interesting exercise but also an utter
> failure.  We'd be far better off to eat the extra 4 bytes per tuple
> header and put back the separate Xmax field.  The costs in both runtime
> and complexity/reliability of this patch are simply not justified by
> a 4-byte savings.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: nested xacts and phantom Xids

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Tom Lane wrote:
> > Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > > Here I present the nested transactions patch and the phantom Xids patch
> > > that goes with it.
> >
> > I looked at the phantom XIDs stuff a bit.  I still have little confidence
> > that the concept is correct :-( but here are some comments on the code
> > level.
> >
> > > +  * They are marked immediately in pg_subtrans as direct childs of the topmost
> > > +  * Xid (no matter how deep we are in the transaction tree),
> >
> > Why is that a good idea --- won't it cause problems when you
> > commit/abort an intermediate level of subtransaction?
>
> I don't think so.  The phantom xid is used only by the outside
> transaction waiting for that tuple to be committe or aborted.  The
> outside tranaction will sleep on the topmost xid completing, then check
> the phantom xid status for commit/abort. Within the transaction, I think
> he uses command counter to know the creation and destruction sub-xid.
>
> I think right now phantom xids are used only for making parts of a
> subtransaction committed or aborted and only in cases where the tuple is
> created and destroyed by parts of the same transaction tree.
>
> I don't feel too bad about the runtime cost if only subtransactions are
> paying that cost.  I know we are really stretching the system here but I
> would like to try a little more rather than give up and taking a space
> hit for all tuples.

Let me add that outside transactions read those phantom xid only when
they are doing dirty reads.  What I don't understand is when do outside
transactions see tuples created inside a transaction?  INSERT into a
table with a unique key?

Once the main transaction commits, these phantom tuples should work just
like ordinary tuples except they get their cmin overwritten when they
are expired, I think.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: nested xacts and phantom Xids

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I don't feel too bad about the runtime cost if only subtransactions are
> paying that cost.

That's exactly why I'm so exercised about what's been done to the
HeapTupleSet/Get macros.  That's significant cost that's paid even when
you're not using *any* of this stuff.

> I know we are really stretching the system here but I
> would like to try a little more rather than give up and taking a space
> hit for all tuples.

I don't even have any confidence that there are no fundamental bugs
in the phantom-xid concept :-(.  I'd be willing to play along if an
implementation that seemed acceptable speedwise were being offered,
but this thing is not preferable to four-more-bytes even if it works.

            regards, tom lane

Re: nested xacts and phantom Xids

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Let me add that outside transactions read those phantom xid only when
> they are doing dirty reads.  What I don't understand is when do outside
> transactions see tuples created inside a transaction?  INSERT into a
> table with a unique key?

> Once the main transaction commits, these phantom tuples should work just
> like ordinary tuples except they get their cmin overwritten when they
> are expired, I think.

You're not doing anything to increase my confidence level in this
concept.  You invented it, and even you aren't sure how it works.

            regards, tom lane

Re: nested xacts and phantom Xids

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Let me add that outside transactions read those phantom xid only when
> > they are doing dirty reads.  What I don't understand is when do outside
> > transactions see tuples created inside a transaction?  INSERT into a
> > table with a unique key?
>
> > Once the main transaction commits, these phantom tuples should work just
> > like ordinary tuples except they get their cmin overwritten when they
> > are expired, I think.
>
> You're not doing anything to increase my confidence level in this
> concept.  You invented it, and even you aren't sure how it works.

And your point is?   I am trying to help Alvaro.  I came up with an
idea, and some thought it would help solve the problem.  What ideas do
you have to help him?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: nested xacts and phantom Xids

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> Here I present the nested transactions patch and the phantom Xids patch
> that goes with it.

I took a very preliminary look through the nested-xacts patch, too.

> Missing: rollback of SET CONSTRAINTS and GUC vars.

There's a good deal more than that missing :-(.  Here are the modules or
actions that are called in CommitTransaction and/or AbortTransaction
that have not yet been touched by the patch:

shared-inval message queue processing (inval.c)
lo_commit
localbuf.c (refcounts need fixed same as bufmgr)
eoxactcallbacks API needs extension
GUC
namespace (temp namespace cleanup)
files (close/drop temp files)
reset current userid during xact abort
password file updating
SetReindexProcessing disable (can these be nested??)

Of these the one that I think most urgently needs attention is inval.c;
that is complicated code already and figuring out what should happen
for subtrans commit/abort is not trivial.  The others look relatively
straightforward to fix, if tedious.

Given patches for inval.c and guc.c, I would say that the patch is
functionally close enough to done that we could commit to including
it in 7.5 --- the other stuff could be wrapped up post-feature-freeze.

BUT (you knew there was a but coming, right?) ... I am really really
disturbed about the potential performance hit from this patch.
I don't think we want to have a nested-transaction feature that imposes
significant overhead on people who aren't actually using nested
transactions.  As I looked through the patch I found quite a few places
that would impose such overhead.

The main thing that's bothering me is that TransactionIdIsInProgress
has been turned from a relatively cheap test (just scan the in-memory
PGPROC array) into an exceedingly expensive one.  It's gratutitously
badly coded (for N open main transactions it does N searches of the
subtrans tree, when one would do), but I don't really want it going to
the subtrans tree at all during typical cases.  We had talked about
keeping a limited number of active subtrans IDs in the PGPROC array,
and I think we're really going to need to do that to buy back
performance here.

(BTW, what is the intended behavior of TransactionIdIsInProgress for
a committed or aborted subtransaction whose parent is still open?
If it has to recognize aborted subtranses as still in progress then
things get very messy, but I think it probably doesn't have to.)

I'm quite unhappy with the wide use of SubTransXidsHaveCommonAncestor to
replace TransactionIdEquals in tqual.c, too.  This turns very cheap tests
into very expensive ones --- whether or not you use subtransactions ---
and unfortunately tqual.c is coded on the assumption that these tests
are cheap.  We can *not* afford to repeat SubTransXidsHaveCommonAncestor
for every open main transaction every time we check a tuple.  In many
ways this is the same problem as with TransactionIdIsInProgress, and
it probably needs a similar solution.

Also I found it annoying that XactLockTableWait waits for a top xact
even when target subxact has already aborted; possibly this is even a
recipe for deadlock.  Perhaps should iterate up the tree one level at a
time, waiting for each?  Would mean that subxacts have to acquire a
lock on their xid that other people can wait for, but this is not
necessarily bad.  (In fact, if we do that then I don't think
XactLockTableWait has to change at all.)

So I think there are several must-fix performance issues as well before
we can make a decision to accept this for 7.5.

I'd feel better about this if we had another month, but with ten days
before feature freeze it's gonna be touch and go.

            regards, tom lane

Re: nested xacts and phantom Xids

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> What ideas do you have to help him?

Eat the four-byte overhead.

Quite frankly, given the other functionality and performance issues
he has to solve in the next ten days (see my other message just now),
I think he'd be a fool to spend one more minute on phantom XIDs.
Maybe for 7.6 someone will have the time to make the idea work.

            regards, tom lane

Re: nested xacts and phantom Xids

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > What ideas do you have to help him?
>
> Eat the four-byte overhead.
>
> Quite frankly, given the other functionality and performance issues
> he has to solve in the next ten days (see my other message just now),
> I think he'd be a fool to spend one more minute on phantom XIDs.
> Maybe for 7.6 someone will have the time to make the idea work.

Well, the problem with eating the 4-byte overhead is that it has a
performance/storage impact for all installations, not just those that
use nested transactions.  You were discussing a performance impact for
nested transactions, but I assume that is an impact when some on the
server are using nested transactions and others are not, while this hit
would be for all sites, even when no one is using nested transactions.

However, given the list you just supplied, I agree Alvaro should focus
on these items and add the 4-bytes to store needed values rather than
continue to hack on phantom xids.  We can certainly revisit that later,
even in 7.6, or never.  We can put Manfred on it.  He did the compaction
in the first place.  :-)

One point is that if we want to re-add those 4 bytes, we have to get
community buy-in for it because it is a step backward for those who are
not going to use nested transactions.  (I don't want to go the
compile-time switch route.)

As far as cleaning up some of the items after feature freeze, well, I
would like to avoid that, but it seems safe to do because the changes
should be very localized, and because win32 will be cleaning up things
during feature freeze too, I am sure.

However, going into feature freeze knowing a features isn't 100%
functional is something we try to avoid, so this would have to be a
special exception based on the fact that the features is very difficult,
and that Alvaro is working full-time on this.  It does take us down the
slippery slope of having the release process dictated by completion of
nested transactions.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: nested xacts and phantom Xids

From
Alvaro Herrera
Date:
On Sun, Jun 20, 2004 at 04:37:16PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > Here I present the nested transactions patch and the phantom Xids patch
> > that goes with it.
>
> I looked at the phantom XIDs stuff a bit.  I still have little confidence
> that the concept is correct :-( but here are some comments on the code
> level.

Ok.  I for one think this got much more complex than I had originally
thought it would be.  I agree the changes to Set/Get Xmin/Xmax are way
beyond what one would want, but the alternative would be to spread the
complexity into their callers and I think that would be much worse.

I don't have a lot of confidence in this either.  The patch will be
available in archives if anybody wants to implement this in a cleaner
and safer way; I'll continue working on the rest of the things you
pointed out in the subtransactions patch.

Sadly, something broke in a really strange way between my last cvs
update and today's, because I can't get the patch to initdb.  It
compiles without warnings (and I did make distclean), but there's a
weird bug I'll have to pursue.


Regarding the invalidation messages, what I'm currently looking at is to
add a TransactionId to each message, which will be CurrentTransactionId
for each new message.  When a subxact commits, all its messages are
relabeled to its parent.  When a subxact aborts, all messages labeled
with its TransactionId are locally processed and discarded.  This is
tricky because chunks are prepended to the queue, but it also means we
can stop processing as soon as a message belongs to another transaction.

I think GUC vars are easier: when a variable is about to change value
inside a subtransaction, save the previous value in a list from which it
will be restored if the subtransaction later aborts.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Hay dos momentos en la vida de un hombre en los que no debería
especular: cuando puede permitírselo y cuando no puede" (Mark Twain)


Re: nested xacts and phantom Xids

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> Regarding the invalidation messages, what I'm currently looking at is to
> add a TransactionId to each message, which will be CurrentTransactionId
> for each new message.  When a subxact commits, all its messages are
> relabeled to its parent.  When a subxact aborts, all messages labeled
> with its TransactionId are locally processed and discarded.

Sounds plausible offhand.

> This is tricky because chunks are prepended to the queue, but it also
> means we can stop processing as soon as a message belongs to another
> transaction.

AFAIR there isn't any essential semantics to the ordering of the queue
entries, so you can probably get away with reordering if that makes life
any easier.

Also, rather than labeling each entry individually, it might be better
to keep a separate list for each level of transaction.  Then instead of
relabeling, you'd just concat the subtrans list to its parent's.  Seems
like this should be faster and less storage.

            regards, tom lane

Re: nested xacts and phantom Xids

From
Alvaro Herrera
Date:
On Mon, Jun 21, 2004 at 10:28:59PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:

> > This is tricky because chunks are prepended to the queue, but it also
> > means we can stop processing as soon as a message belongs to another
> > transaction.
>
> AFAIR there isn't any essential semantics to the ordering of the queue
> entries, so you can probably get away with reordering if that makes life
> any easier.
>
> Also, rather than labeling each entry individually, it might be better
> to keep a separate list for each level of transaction.  Then instead of
> relabeling, you'd just concat the subtrans list to its parent's.  Seems
> like this should be faster and less storage.

Right, but it makes harder to detect when a duplicate relcache entry is
going to be inserted.  Judging from the commentary in the file this is
an issue.

Another idea I had was:
1. at subtransaction start, add a special "boundary" message carrying
the new Xid.
2. at subtransaction abort, process the first chunk backwards until the
boundary message with the corresponding Xid is found; if it's not, jump
to the next and repeat.

At subtransaction commit nothing special needs to be done.

This avoids having to label each message and to change the message
appending scheme.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"The eagle never lost so much time, as
when he submitted to learn of the crow." (William Blake)


Re: nested xacts and phantom Xids

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> On Mon, Jun 21, 2004 at 10:28:59PM -0400, Tom Lane wrote:
>> Also, rather than labeling each entry individually, it might be better
>> to keep a separate list for each level of transaction.  Then instead of
>> relabeling, you'd just concat the subtrans list to its parent's.  Seems
>> like this should be faster and less storage.

> Right, but it makes harder to detect when a duplicate relcache entry is
> going to be inserted.  Judging from the commentary in the file this is
> an issue.

It's only a minor efficiency hack; don't get too tense about avoiding dups.
(We don't bother to check for dup catcache-inval entries at all...)
The only thing I'd suggest you need to preserve is the business about
invalidating catcache before relcache; again that's just an efficiency
hack, but it seems simple enough to be worth doing.

            regards, tom lane

Re: nested xacts and phantom Xids

From
Alvaro Herrera
Date:
On Sun, Jun 20, 2004 at 08:49:22PM -0400, Tom Lane wrote:

> Of these the one that I think most urgently needs attention is inval.c;
> that is complicated code already and figuring out what should happen
> for subtrans commit/abort is not trivial.  The others look relatively
> straightforward to fix, if tedious.

A patch for this is attached.  It _seems_ to work ok; conceptually it
seems ok too.  I have done some testing which tends to indicate that it
is right, but I'm not sure of all the implications this code has so I
don't know how to test it exhaustively.  Of course, regression tests
pass, but this doesn't actually prove anything.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"¿Que diferencia tiene para los muertos, los huérfanos, y aquellos que han
perdido su hogar, si la loca destrucción ha sido realizada bajo el nombre
del totalitarismo o del santo nombre de la libertad y la democracia?" (Gandhi)

Attachment

Re: nested xacts and phantom Xids

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> On Sun, Jun 20, 2004 at 04:37:16PM -0400, Tom Lane wrote:
> > Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > > Here I present the nested transactions patch and the phantom Xids patch
> > > that goes with it.
> >
> > I looked at the phantom XIDs stuff a bit.  I still have little confidence
> > that the concept is correct :-( but here are some comments on the code
> > level.
>
> Ok.  I for one think this got much more complex than I had originally
> thought it would be.  I agree the changes to Set/Get Xmin/Xmax are way
> beyond what one would want, but the alternative would be to spread the
> complexity into their callers and I think that would be much worse.
>
> I don't have a lot of confidence in this either.  The patch will be
> available in archives if anybody wants to implement this in a cleaner
> and safer way; I'll continue working on the rest of the things you
> pointed out in the subtransactions patch.

I am sorry to have given Alvaro another idea that didn't work.  However,
thinking of options, I wonder if instead of phantom xids, we should do
phantom cids.  Because only the local backend looks at the command
counter (cid).  I think it might be alot cleaner.  The phantom cid would
have a tuple bit set indicating that instead of being a cid, it is an
index into an array of cmin/cmax pairs.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: nested xacts and phantom Xids

From
Bruce Momjian
Date:
If we add nested transactions for 7.5, are we going to need savepoints
too in the same release?  If we don't, are we going to get a lot of
complaints?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: nested xacts and phantom Xids

From
Alvaro Herrera
Date:
On Wed, Jun 23, 2004 at 08:58:15AM -0400, Bruce Momjian wrote:
> If we add nested transactions for 7.5, are we going to need savepoints
> too in the same release?  If we don't, are we going to get a lot of
> complaints?

It'd be good to have savepoints right now.  I'm not sure it'd be good to
expose the nested transactions implementation if we are going to offer
savepoints later, because it means we will have to keep nested
transactions forever.

Maybe it is a good idea to hide the implementation details and offer
only the standard SAVEPOINT feature.  I'm not sure how hard it is to
change the syntax; I don't think it'd be a big deal.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"La conclusión que podemos sacar de esos estudios es que
no podemos sacar ninguna conclusión de ellos" (Tanenbaum)


Re: nested xacts and phantom Xids

From
Alvaro Herrera
Date:
On Wed, Jun 23, 2004 at 08:57:11AM -0400, Bruce Momjian wrote:

> I am sorry to have given Alvaro another idea that didn't work.

It allowed me to learn a lot of cool tricks, so it wasn't wasted work.
The only think I'm sorry about is that I should have used the time for
something more useful, like tackling the remaining problems in the
nested xacts implementation proper.

> However, thinking of options, I wonder if instead of phantom xids, we
> should do phantom cids.  Because only the local backend looks at the
> command counter (cid).  I think it might be alot cleaner.  The phantom
> cid would have a tuple bit set indicating that instead of being a cid,
> it is an index into an array of cmin/cmax pairs.

Yeah, maybe this can work.  I'm not going to try however, at least not
now.  If somebody else wants to try, be my guest.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"El conflicto es el camino real hacia la unión"


Re: nested xacts and phantom Xids

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> It'd be good to have savepoints right now.  I'm not sure it'd be good to
> expose the nested transactions implementation if we are going to offer
> savepoints later, because it means we will have to keep nested
> transactions forever.

Nested transactions are *good*.  We should not hide them.

I don't object to offering spec-compatible savepoints, but the plain
fact of the matter is that that's a dumbed-down API.  We should not
feel constrained to offer only that.

            regards, tom lane

Re: nested xacts and phantom Xids

From
Simon Riggs
Date:
On Wed, 2004-06-23 at 13:57, Bruce Momjian wrote:
> I am sorry to have given Alvaro another idea that didn't work.

No way! Keep having the ideas, please.

I've done some more digging in dead time on all of this and I think
we're on the right course in general by implementing all of this.

...well done to Alvaro for being able to make the ideas reality.

Best regards, Simon Riggs


Re: nested xacts and phantom Xids

From
Alvaro Herrera
Date:
On Sun, Jun 20, 2004 at 08:49:22PM -0400, Tom Lane wrote:

Regarding GUC, a WIP report:

> Given patches for inval.c and guc.c, I would say that the patch is
> functionally close enough to done that we could commit to including
> it in 7.5 --- the other stuff could be wrapped up post-feature-freeze.

I figured I could save the values whenever they are going to change, and
restore them if the subtransaction aborts.  This seems to work fine
(lightly tested).

I still have to figure out how to handle allocation for string vars, but
I thought I'd post the patch for others to see.  Please let me know if
it's too ugly.  (This patch misses the pieces in xact.c and xact.h but
I'm sure the concept is clear.)

I'll post a full patch once the missing deferred trigger stuff works.
With the patches I posted to inval.c I think this fulfills the
requirements, barring the performance issues raised.

Comments?

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"No single strategy is always right (Unless the boss says so)"
(Larry Wall)

Attachment

Re: nested xacts and phantom Xids

From
Alvaro Herrera
Date:
I said

> I'll post a full patch once the missing deferred trigger stuff works.
> With the patches I posted to inval.c I think this fulfills the
> requirements, barring the performance issues raised.

Ok, here is a full patch that includes:

- all what was in the previous patch
- GUC vars are rolled back on subxact abort
- the inval msg lists are cleaned up of items created in the aborting
  subxact
- SET CONSTRAINTS is rolled back too


Missing:

- all the other items Tom mentioned in his last mail, including
  performance enhancements.

- discussion whether we want a different syntax for subxacts, like
  SUBBEGIN/SUBCOMMIT/SUBABORT instead of BEGIN/COMMIT/ABORT.  Please
  comment on this point.

- support for SAVEPOINT syntax.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"La grandeza es una experiencia transitoria.  Nunca es consistente.
Depende en gran parte de la imaginación humana creadora de mitos"
(Irulan)

Attachment

Re: nested xacts and phantom Xids

From
Tom Lane
Date:
Do we really need SubtransCutoffXid?  AFAICS the reason for having it is
only this claim in RecordTransactionCommit:

         * We can't mark committed subtransactions as fully committed,
         * because concurrent transactions would see them as committed
         * and not as in-progress.  Leave them as "subcommitted" until
         * the parent transaction is below OldestXmin, per VACUUM.

but I think this is dead wrong.  As long as we mark the parent committed
first, there is no race condition.  tqual tests that are using snapshots
will need to recognize that the subtransaction is a member of one of the
snapshotted main XIDs, and those that are not will think committed is
committed.  So I want to mark subtransactions fully committed in
RecordTransactionCommit, and lose SubtransCutoffXid.  Comments?

BTW, it would help to know what parts of the patch you intend to work on
over the next couple of days.  I'm reviewing and editorializing right
now with intent to commit soon, so it would be good if we can avoid
tromping on each others' feet.

            regards, tom lane

Re: nested xacts and phantom Xids

From
Alvaro Herrera
Date:
On Sat, Jun 26, 2004 at 07:56:09PM -0400, Tom Lane wrote:
> Do we really need SubtransCutoffXid?  AFAICS the reason for having it is
> only this claim in RecordTransactionCommit:
>
>          * We can't mark committed subtransactions as fully committed,
>          * because concurrent transactions would see them as committed
>          * and not as in-progress.  Leave them as "subcommitted" until
>          * the parent transaction is below OldestXmin, per VACUUM.

Right, that's the only point where it's used.  I don't know clearly if
some kind of mechanism will be needed to handle SUBTRANS COMMIT states
in pg_clog that were left behind by a crashed subtransaction though.

> but I think this is dead wrong.  As long as we mark the parent committed
> first, there is no race condition.  tqual tests that are using snapshots
> will need to recognize that the subtransaction is a member of one of the
> snapshotted main XIDs, and those that are not will think committed is
> committed.  So I want to mark subtransactions fully committed in
> RecordTransactionCommit, and lose SubtransCutoffXid.  Comments?

Yes, sounds good.

> BTW, it would help to know what parts of the patch you intend to work on
> over the next couple of days.  I'm reviewing and editorializing right
> now with intent to commit soon, so it would be good if we can avoid
> tromping on each others' feet.

This is really excellent news.

I'll work on adding the Xid-cache to PGPROC and using that in
TransactionIdIsInProgress and the tqual routines.  If you want to work
on that let me know and I'll handle things like the password file, local
bufmgr refcount, etc.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
FOO MANE PADME HUM


Re: nested xacts and phantom Xids

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> I'll work on adding the Xid-cache to PGPROC and using that in
> TransactionIdIsInProgress and the tqual routines.  If you want to work
> on that let me know and I'll handle things like the password file, local
> bufmgr refcount, etc.

Either one is okay, though doing the latter (ie modules you didn't touch
yet) would be probably a bit easier to merge.

            regards, tom lane

Re: nested xacts and phantom Xids

From
Alvaro Herrera
Date:
On Sat, Jun 26, 2004 at 07:56:09PM -0400, Tom Lane wrote:

> BTW, it would help to know what parts of the patch you intend to work on
> over the next couple of days.  I'm reviewing and editorializing right
> now with intent to commit soon, so it would be good if we can avoid
> tromping on each others' feet.

Oops, I forgot that I had inadvertently left a kludge in
commands/trigger.c.  Please use this patch for this file instead.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Jude: I wish humans laid eggs
Ringlord: Why would you want humans to lay eggs?
Jude: So I can eat them

Attachment

Re: nested xacts and phantom Xids

From
Alvaro Herrera
Date:
On Sun, Jun 27, 2004 at 12:49:10AM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > I'll work on adding the Xid-cache to PGPROC and using that in
> > TransactionIdIsInProgress and the tqual routines.  If you want to work
> > on that let me know and I'll handle things like the password file, local
> > bufmgr refcount, etc.
>
> Either one is okay, though doing the latter (ie modules you didn't touch
> yet) would be probably a bit easier to merge.

Will do.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Un poeta es un mundo encerrado en un hombre" (Victor Hugo)


Re: nested xacts and phantom Xids

From
Alvaro Herrera
Date:
On Tue, Jun 29, 2004 at 03:22:52PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > - GUC vars are rolled back on subxact abort
>
> This did not work very well, but here is a revised GUC patch that I think
> does work.  It requires xact.c to export a function to report the
> current nesting depth, and requires AtEOXact_GUC to be called in all
> four cleanup paths (main and subxact commit and abort).

Very cool, thank you.  I had thought about doing something like this but
in the end I got scared away of changing too much code here.


> BTW, why do you have assign_transaction_read_only() in your patch?  It
> seems to me to be useful to create a readonly subxact of a read-write
> outer transaction.  Or is that just not-done-yet?

Nah, it's a leftover from back when there wasn't any way to roll back
GUC vars.  I thought it should be handled similarly to the way the
isolation level should be handled.  With your patch I think it can be
ripped away entirely.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")


Re: nested xacts and phantom Xids

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> - GUC vars are rolled back on subxact abort

This did not work very well, but here is a revised GUC patch that I think
does work.  It requires xact.c to export a function to report the
current nesting depth, and requires AtEOXact_GUC to be called in all
four cleanup paths (main and subxact commit and abort).

BTW, why do you have assign_transaction_read_only() in your patch?  It
seems to me to be useful to create a readonly subxact of a read-write
outer transaction.  Or is that just not-done-yet?

            regards, tom lane


*** src/backend/utils/misc/README.orig    Mon Jan 19 14:04:40 2004
--- src/backend/utils/misc/README    Tue Jun 29 15:14:44 2004
***************
*** 68,116 ****
  would be effective had there never been any SET commands in the current
  session.

! To handle these cases we must keep track of as many as four distinct
! values for each variable.  They are:

  * actual variable contents    always the current effective value

  * reset_value            the value to use for RESET

- * session_value            the "committed" setting for the session
-
  * tentative_value        the uncommitted result of SET

! During initialization we set the first three of these (actual, reset_value,
! and session_value) based on whichever non-interactive source has the
! highest priority.  All three will have the same value.

  A SET LOCAL command sets the actual variable (and nothing else).  At
! transaction end, the session_value is used to restore the actual variable
! to its pre-transaction value.

  A SET (or SET SESSION) command sets the actual variable, and if no error,
  then sets the tentative_value.  If the transaction commits, the
! tentative_value is assigned to the session_value and the actual variable
! (which could by now be different, if the SET was followed by SET LOCAL).
! If the transaction aborts, the tentative_value is discarded and the
! actual variable is restored from the session_value.

  RESET is executed like a SET, but using the reset_value as the desired new
  value.  (We do not provide a RESET LOCAL command, but SET LOCAL TO DEFAULT
  has the same behavior that RESET LOCAL would.)  The source associated with
! the reset_value also becomes associated with the actual and session values.

  If SIGHUP is received, the GUC code rereads the postgresql.conf
  configuration file (this does not happen in the signal handler, but at
  next return to main loop; note that it can be executed while within a
  transaction).  New values from postgresql.conf are assigned to actual
! variable, reset_value, and session_value, but only if each of these has a
! current source priority <= PGC_S_FILE.  (It is thus possible for
! reset_value to track the config-file setting even if there is currently
! a different interactive value of the actual variable.)

  Note that tentative_value is unused and undefined except between a SET
  command and the end of the transaction.  Also notice that we must track
! the source associated with each of the four values.

  The assign_hook and show_hook routines work only with the actual variable,
  and are not directly aware of the additional values maintained by GUC.
--- 68,133 ----
  would be effective had there never been any SET commands in the current
  session.

! To handle these cases we must keep track of many distinct values for each
! variable.  The primary values are:

  * actual variable contents    always the current effective value

  * reset_value            the value to use for RESET

  * tentative_value        the uncommitted result of SET

! The reason we need a tentative_value separate from the actual value is
! that when a transaction does SET followed by SET LOCAL, the actual value
! will now be the LOCAL value, but we want to remember the prior SET so that
! that value is restored at transaction commit.
!
! In addition, for each level of transaction (possibly nested) we have to
! remember the transaction-entry-time actual and tentative values, in case
! we need to restore them at transaction end.  (The RESET value is essentially
! non-transactional, so it doesn't have to be stacked.)  For efficiency these
! stack entries are not constructed until/unless the variable is actually SET
! within a particular transaction.
!
! During initialization we set the actual value and reset_value based on
! whichever non-interactive source has the highest priority.  They will
! have the same value.  The tentative_value is not meaningful at this point.
!
! A SET command starts by stacking the existing actual and tentative values
! if this hasn't already been done within the current transaction.  Then:

  A SET LOCAL command sets the actual variable (and nothing else).  At
! transaction end, the stacked values are used to restore the GUC entry
! to its pre-transaction state.

  A SET (or SET SESSION) command sets the actual variable, and if no error,
  then sets the tentative_value.  If the transaction commits, the
! tentative_value is assigned again to the actual variable (which could by
! now be different, if the SET was followed by SET LOCAL).  If the
! transaction aborts, the stacked values are used to restore the GUC entry
! to its pre-transaction state.
!
! In the case of SET within nested subtransactions, at each commit the
! tentative_value propagates out to the next transaction level.  It will
! be thrown away at abort of any level, or after exiting the top transaction.

  RESET is executed like a SET, but using the reset_value as the desired new
  value.  (We do not provide a RESET LOCAL command, but SET LOCAL TO DEFAULT
  has the same behavior that RESET LOCAL would.)  The source associated with
! the reset_value also becomes associated with the actual and tentative values.

  If SIGHUP is received, the GUC code rereads the postgresql.conf
  configuration file (this does not happen in the signal handler, but at
  next return to main loop; note that it can be executed while within a
  transaction).  New values from postgresql.conf are assigned to actual
! variable, reset_value, and stacked actual values, but only if each of
! these has a current source priority <= PGC_S_FILE.  (It is thus possible
! for reset_value to track the config-file setting even if there is
! currently a different interactive value of the actual variable.)

  Note that tentative_value is unused and undefined except between a SET
  command and the end of the transaction.  Also notice that we must track
! the source associated with each one of the values.

  The assign_hook and show_hook routines work only with the actual variable,
  and are not directly aware of the additional values maintained by GUC.
***************
*** 129,137 ****
  context anyway, and strdup gives us more control over handling
  out-of-memory failures.

! We allow a variable's actual value, reset_val, session_val, and
! tentative_val to point at the same storage.  This makes it slightly harder
! to free space (must test that the value to be freed isn't equal to any of
! the other three pointers).  The main advantage is that we never need to
! strdup during transaction commit/abort, so cannot cause an out-of-memory
! failure there.
--- 146,154 ----
  context anyway, and strdup gives us more control over handling
  out-of-memory failures.

! We allow a string variable's actual value, reset_val, tentative_val, and
! stacked copies of same to point at the same storage.  This makes it
! slightly harder to free space (must test whether a value to be freed isn't
! equal to any of the other pointers in the GUC entry or associated stack
! items).  The main advantage is that we never need to strdup during
! transaction commit/abort, so cannot cause an out-of-memory failure there.
*** src/backend/utils/misc/guc.c.orig    Sat Jun 12 14:22:41 2004
--- src/backend/utils/misc/guc.c    Tue Jun 29 14:55:50 2004
***************
*** 16,26 ****
   */
  #include "postgres.h"

! #include <errno.h>
  #include <float.h>
  #include <limits.h>
  #include <unistd.h>
- #include <ctype.h>

  #include "utils/guc.h"
  #include "utils/guc_tables.h"
--- 16,25 ----
   */
  #include "postgres.h"

! #include <ctype.h>
  #include <float.h>
  #include <limits.h>
  #include <unistd.h>

  #include "utils/guc.h"
  #include "utils/guc_tables.h"
***************
*** 54,59 ****
--- 53,59 ----
  #include "tcop/tcopprot.h"
  #include "utils/array.h"
  #include "utils/builtins.h"
+ #include "utils/memutils.h"
  #include "utils/pg_locale.h"
  #include "pgstat.h"

***************
*** 105,110 ****
--- 105,111 ----
                             GucSource source);
  static bool assign_stage_log_stats(bool newval, bool doit, GucSource source);
  static bool assign_log_stats(bool newval, bool doit, GucSource source);
+ static bool assign_transaction_read_only(bool newval, bool doit, GucSource source);


  /*
***************
*** 174,218 ****
  static int    block_size;
  static bool integer_datetimes;

- /* Macros for freeing malloc'd pointers only if appropriate to do so */
- /* Some of these tests are probably redundant, but be safe ... */
- #define SET_STRING_VARIABLE(rec, newval) \
-     do { \
-         if (*(rec)->variable && \
-             *(rec)->variable != (rec)->reset_val && \
-             *(rec)->variable != (rec)->session_val && \
-             *(rec)->variable != (rec)->tentative_val) \
-             free(*(rec)->variable); \
-         *(rec)->variable = (newval); \
-     } while (0)
- #define SET_STRING_RESET_VAL(rec, newval) \
-     do { \
-         if ((rec)->reset_val && \
-             (rec)->reset_val != *(rec)->variable && \
-             (rec)->reset_val != (rec)->session_val && \
-             (rec)->reset_val != (rec)->tentative_val) \
-             free((rec)->reset_val); \
-         (rec)->reset_val = (newval); \
-     } while (0)
- #define SET_STRING_SESSION_VAL(rec, newval) \
-     do { \
-         if ((rec)->session_val && \
-             (rec)->session_val != *(rec)->variable && \
-             (rec)->session_val != (rec)->reset_val && \
-             (rec)->session_val != (rec)->tentative_val) \
-             free((rec)->session_val); \
-         (rec)->session_val = (newval); \
-     } while (0)
- #define SET_STRING_TENTATIVE_VAL(rec, newval) \
-     do { \
-         if ((rec)->tentative_val && \
-             (rec)->tentative_val != *(rec)->variable && \
-             (rec)->tentative_val != (rec)->reset_val && \
-             (rec)->tentative_val != (rec)->session_val) \
-             free((rec)->tentative_val); \
-         (rec)->tentative_val = (newval); \
-     } while (0)
-

  /*
   * Displayable names for context types (enum GucContext)
--- 175,180 ----
***************
*** 801,807 ****
              GUC_NO_RESET_ALL | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
          },
          &XactReadOnly,
!         false, NULL, NULL
      },
      {
          {"add_missing_from", PGC_USERSET, COMPAT_OPTIONS_PREVIOUS,
--- 763,769 ----
              GUC_NO_RESET_ALL | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
          },
          &XactReadOnly,
!         false, assign_transaction_read_only, NULL
      },
      {
          {"add_missing_from", PGC_USERSET, COMPAT_OPTIONS_PREVIOUS,
***************
*** 1766,1779 ****
   */
  static struct config_generic **guc_variables;

! /* Current number of variables contained in the vector
!  */
  static int num_guc_variables;

! /* Vector capacity
!  */
  static int size_guc_variables;

  static bool guc_dirty;            /* TRUE if need to do commit/abort work */

  static bool reporting_enabled;    /* TRUE to enable GUC_REPORT */
--- 1728,1740 ----
   */
  static struct config_generic **guc_variables;

! /* Current number of variables contained in the vector */
  static int num_guc_variables;

! /* Vector capacity */
  static int size_guc_variables;

+
  static bool guc_dirty;            /* TRUE if need to do commit/abort work */

  static bool reporting_enabled;    /* TRUE to enable GUC_REPORT */
***************
*** 1783,1796 ****

  static int    guc_var_compare(const void *a, const void *b);
  static int    guc_name_compare(const char *namea, const char *nameb);
  static void ReportGUCOption(struct config_generic * record);
  static char *_ShowOption(struct config_generic * record);

! struct config_generic** get_guc_variables()
  {
      return guc_variables;
  }

  /*
   * Build the sorted array.    This is split out so that it could be
   * re-executed after startup (eg, we could allow loadable modules to
--- 1744,1814 ----

  static int    guc_var_compare(const void *a, const void *b);
  static int    guc_name_compare(const char *namea, const char *nameb);
+ static void push_old_value(struct config_generic *gconf);
  static void ReportGUCOption(struct config_generic * record);
  static char *_ShowOption(struct config_generic * record);

!
! /*
!  * Support for assigning to a field of a string GUC item.  Free the prior
!  * value if it's not referenced anywhere else in the item (including stacked
!  * states).
!  */
! static void
! set_string_field(struct config_string *conf, char **field, char *newval)
! {
!     char    *oldval = *field;
!     GucStack *stack;
!
!     /* Do the assignment */
!     *field = newval;
!
!     /* Exit if any duplicate references, or if old value was NULL anyway */
!     if (oldval == NULL ||
!         oldval == *(conf->variable) ||
!         oldval == conf->reset_val ||
!         oldval == conf->tentative_val)
!         return;
!     for (stack = conf->gen.stack; stack; stack = stack->prev)
!     {
!         if (oldval == stack->tentative_val.stringval ||
!             oldval == stack->value.stringval)
!             return;
!     }
!
!     /* Not used anymore, so free it */
!     free(oldval);
! }
!
! /*
!  * Detect whether strval is referenced anywhere in a GUC string item
!  */
! static bool
! string_field_used(struct config_string *conf, char *strval)
! {
!     GucStack *stack;
!
!     if (strval == *(conf->variable) ||
!         strval == conf->reset_val ||
!         strval == conf->tentative_val)
!         return true;
!     for (stack = conf->gen.stack; stack; stack = stack->prev)
!     {
!         if (strval == stack->tentative_val.stringval ||
!             strval == stack->value.stringval)
!             return true;
!     }
!     return false;
! }
!
!
! struct config_generic **
! get_guc_variables(void)
  {
      return guc_variables;
  }

+
  /*
   * Build the sorted array.    This is split out so that it could be
   * re-executed after startup (eg, we could allow loadable modules to
***************
*** 2001,2014 ****
              return find_option(map_old_guc_names[i+1]);
      }

!     /* Check if the name is qualified, and if so, check if the qualifier
       * maps to a custom variable class.
       */
      dot = strchr(name, GUC_QUALIFIER_SEPARATOR);
      if(dot != NULL && is_custom_class(name, dot - name))
!         /*
!          * Add a placeholder variable for this name
!          */
          return (struct config_generic*)add_placeholder_variable(name);

      /* Unknown name */
--- 2019,2031 ----
              return find_option(map_old_guc_names[i+1]);
      }

!     /*
!      * Check if the name is qualified, and if so, check if the qualifier
       * maps to a custom variable class.
       */
      dot = strchr(name, GUC_QUALIFIER_SEPARATOR);
      if(dot != NULL && is_custom_class(name, dot - name))
!         /* Add a placeholder variable for this name */
          return (struct config_generic*)add_placeholder_variable(name);

      /* Unknown name */
***************
*** 2081,2089 ****

          gconf->status = 0;
          gconf->reset_source = PGC_S_DEFAULT;
-         gconf->session_source = PGC_S_DEFAULT;
          gconf->tentative_source = PGC_S_DEFAULT;
          gconf->source = PGC_S_DEFAULT;

          switch (gconf->vartype)
          {
--- 2098,2106 ----

          gconf->status = 0;
          gconf->reset_source = PGC_S_DEFAULT;
          gconf->tentative_source = PGC_S_DEFAULT;
          gconf->source = PGC_S_DEFAULT;
+         gconf->stack = NULL;

          switch (gconf->vartype)
          {
***************
*** 2097,2103 ****
                              elog(FATAL, "failed to initialize %s to %d",
                                   conf->gen.name, (int) conf->reset_val);
                      *conf->variable = conf->reset_val;
-                     conf->session_val = conf->reset_val;
                      break;
                  }
              case PGC_INT:
--- 2114,2119 ----
***************
*** 2119,2125 ****
                              elog(FATAL, "failed to initialize %s to %d",
                                   conf->gen.name, conf->reset_val);
                      *conf->variable = conf->reset_val;
-                     conf->session_val = conf->reset_val;
                      break;
                  }
              case PGC_REAL:
--- 2135,2140 ----
***************
*** 2135,2141 ****
                              elog(FATAL, "failed to initialize %s to %g",
                                   conf->gen.name, conf->reset_val);
                      *conf->variable = conf->reset_val;
-                     conf->session_val = conf->reset_val;
                      break;
                  }
              case PGC_STRING:
--- 2150,2155 ----
***************
*** 2150,2156 ****
                             conf->assign_hook == assign_log_statement);
                      *conf->variable = NULL;
                      conf->reset_val = NULL;
-                     conf->session_val = NULL;
                      conf->tentative_val = NULL;

                      if (conf->boot_val == NULL)
--- 2164,2169 ----
***************
*** 2190,2196 ****
                          }
                      }
                      *conf->variable = str;
-                     conf->session_val = str;
                      break;
                  }
          }
--- 2203,2208 ----
***************
*** 2254,2259 ****
--- 2266,2274 ----
          if (gconf->source <= PGC_S_OVERRIDE)
              continue;

+         /* Save old value to support transaction abort */
+         push_old_value(gconf);
+
          switch (gconf->vartype)
          {
              case PGC_BOOL:
***************
*** 2336,2343 ****
                          }
                      }

!                     SET_STRING_VARIABLE(conf, str);
!                     SET_STRING_TENTATIVE_VAL(conf, str);
                      conf->gen.source = conf->gen.reset_source;
                      conf->gen.tentative_source = conf->gen.reset_source;
                      conf->gen.status |= GUC_HAVE_TENTATIVE;
--- 2351,2358 ----
                          }
                      }

!                     set_string_field(conf, conf->variable, str);
!                     set_string_field(conf, &conf->tentative_val, str);
                      conf->gen.source = conf->gen.reset_source;
                      conf->gen.tentative_source = conf->gen.reset_source;
                      conf->gen.status |= GUC_HAVE_TENTATIVE;
***************
*** 2353,2363 ****


  /*
!  * Do GUC processing at transaction commit or abort.
   */
  void
! AtEOXact_GUC(bool isCommit)
  {
      int            i;

      /* Quick exit if nothing's changed in this transaction */
--- 2368,2460 ----


  /*
!  * push_old_value
!  *        Push previous state during first assignment to a GUC variable
!  *        within a particular transaction.
!  *
!  * We have to be willing to "back-fill" the state stack if the first
!  * assignment occurs within a subtransaction nested several levels deep.
!  * This ensures that if an intermediate transaction aborts, it will have
!  * the proper value available to restore the setting to.
!  */
! static void
! push_old_value(struct config_generic *gconf)
! {
!     int            my_level = GetCurrentTransactionNestLevel();
!     GucStack   *stack;
!
!     /* If we're not inside a transaction, do nothing */
!     if (my_level == 0)
!         return;
!
!     for (;;)
!     {
!         /* Done if we already pushed it at this nesting depth */
!         if (gconf->stack && gconf->stack->nest_level >= my_level)
!             return;
!
!         /*
!          * We keep all the stack entries in TopTransactionContext so as to
!          * avoid allocation problems when a subtransaction back-fills stack
!          * entries for upper transaction levels.
!          */
!         stack = (GucStack *) MemoryContextAlloc(TopTransactionContext,
!                                                 sizeof(GucStack));
!
!         stack->prev = gconf->stack;
!         stack->nest_level = stack->prev ? stack->prev->nest_level + 1 : 1;
!         stack->status = gconf->status;
!         stack->tentative_source = gconf->tentative_source;
!         stack->source = gconf->source;
!
!         switch (gconf->vartype)
!         {
!             case PGC_BOOL:
!                 stack->tentative_val.boolval =
!                     ((struct config_bool *) gconf)->tentative_val;
!                 stack->value.boolval =
!                     *((struct config_bool *) gconf)->variable;
!                 break;
!
!             case PGC_INT:
!                 stack->tentative_val.intval =
!                     ((struct config_int *) gconf)->tentative_val;
!                 stack->value.intval =
!                     *((struct config_int *) gconf)->variable;
!                 break;
!
!             case PGC_REAL:
!                 stack->tentative_val.realval =
!                     ((struct config_real *) gconf)->tentative_val;
!                 stack->value.realval =
!                     *((struct config_real *) gconf)->variable;
!                 break;
!
!             case PGC_STRING:
!                 stack->tentative_val.stringval =
!                     ((struct config_string *) gconf)->tentative_val;
!                 stack->value.stringval =
!                     *((struct config_string *) gconf)->variable;
!                 break;
!         }
!
!         gconf->stack = stack;
!
!         /* Set state to indicate nothing happened yet within this level */
!         gconf->status = GUC_HAVE_STACK;
!
!         /* Ensure we remember to pop at end of xact */
!         guc_dirty = true;
!     }
! }
!
! /*
!  * Do GUC processing at transaction or subtransaction commit or abort.
   */
  void
! AtEOXact_GUC(bool isCommit, bool isSubXact)
  {
+     int            my_level;
      int            i;

      /* Quick exit if nothing's changed in this transaction */
***************
*** 2371,2385 ****
          guc_string_workspace = NULL;
      }

      for (i = 0; i < num_guc_variables; i++)
      {
          struct config_generic *gconf = guc_variables[i];
          bool        changed;

!         /* Skip if nothing's happened to this var in this transaction */
!         if (gconf->status == 0)
              continue;

          changed = false;

          switch (gconf->vartype)
--- 2468,2523 ----
          guc_string_workspace = NULL;
      }

+     my_level = GetCurrentTransactionNestLevel();
+     Assert(isSubXact ? (my_level > 1) : (my_level == 1));
+
      for (i = 0; i < num_guc_variables; i++)
      {
          struct config_generic *gconf = guc_variables[i];
+         int            my_status = gconf->status;
+         GucStack   *stack = gconf->stack;
+         bool        useTentative;
          bool        changed;

!         /*
!          * Skip if nothing's happened to this var in this transaction
!          */
!         if (my_status == 0)
!         {
!             Assert(stack == NULL);
!             continue;
!         }
!         /* Assert that we stacked old value before changing it */
!         Assert(stack != NULL && (my_status & GUC_HAVE_STACK));
!         /* However, the last change may have been at an outer xact level */
!         if (stack->nest_level < my_level)
              continue;
+         Assert(stack->nest_level == my_level);

+         /*
+          * We will pop the stack entry.  Start by restoring outer xact status
+          * (since we may want to modify it below).  Be careful to use
+          * my_status to reference the inner xact status below this point...
+          */
+         gconf->status = stack->status;
+
+         /*
+          * We have two cases:
+          *
+          * If commit and HAVE_TENTATIVE, set actual value to tentative
+          * (this is to override a SET LOCAL if one occurred later than SET).
+          * We keep the tentative value and propagate HAVE_TENTATIVE to
+          * the parent status, allowing the SET's effect to percolate up.
+          * (But if we're exiting the outermost transaction, we'll drop the
+          * HAVE_TENTATIVE bit below.)
+          *
+          * Otherwise, we have a transaction that aborted or executed only
+          * SET LOCAL (or no SET at all).  In either case it should have no
+          * further effect, so restore both tentative and actual values from
+          * the stack entry.
+          */
+
+         useTentative = isCommit && (my_status & GUC_HAVE_TENTATIVE) != 0;
          changed = false;

          switch (gconf->vartype)
***************
*** 2387,2512 ****
              case PGC_BOOL:
                  {
                      struct config_bool *conf = (struct config_bool *) gconf;

!                     if (isCommit && (conf->gen.status & GUC_HAVE_TENTATIVE))
                      {
!                         conf->session_val = conf->tentative_val;
!                         conf->gen.session_source = conf->gen.tentative_source;
                      }

!                     if (*conf->variable != conf->session_val)
                      {
                          if (conf->assign_hook)
!                             if (!(*conf->assign_hook) (conf->session_val,
                                                         true, PGC_S_OVERRIDE))
                                  elog(LOG, "failed to commit %s",
                                       conf->gen.name);
!                         *conf->variable = conf->session_val;
                          changed = true;
                      }
!                     conf->gen.source = conf->gen.session_source;
!                     conf->gen.status = 0;
                      break;
                  }
              case PGC_INT:
                  {
                      struct config_int *conf = (struct config_int *) gconf;

!                     if (isCommit && (conf->gen.status & GUC_HAVE_TENTATIVE))
                      {
!                         conf->session_val = conf->tentative_val;
!                         conf->gen.session_source = conf->gen.tentative_source;
                      }

!                     if (*conf->variable != conf->session_val)
                      {
                          if (conf->assign_hook)
!                             if (!(*conf->assign_hook) (conf->session_val,
                                                         true, PGC_S_OVERRIDE))
                                  elog(LOG, "failed to commit %s",
                                       conf->gen.name);
!                         *conf->variable = conf->session_val;
                          changed = true;
                      }
!                     conf->gen.source = conf->gen.session_source;
!                     conf->gen.status = 0;
                      break;
                  }
              case PGC_REAL:
                  {
                      struct config_real *conf = (struct config_real *) gconf;

!                     if (isCommit && (conf->gen.status & GUC_HAVE_TENTATIVE))
                      {
!                         conf->session_val = conf->tentative_val;
!                         conf->gen.session_source = conf->gen.tentative_source;
                      }

!                     if (*conf->variable != conf->session_val)
                      {
                          if (conf->assign_hook)
!                             if (!(*conf->assign_hook) (conf->session_val,
                                                         true, PGC_S_OVERRIDE))
                                  elog(LOG, "failed to commit %s",
                                       conf->gen.name);
!                         *conf->variable = conf->session_val;
                          changed = true;
                      }
!                     conf->gen.source = conf->gen.session_source;
!                     conf->gen.status = 0;
                      break;
                  }
              case PGC_STRING:
                  {
                      struct config_string *conf = (struct config_string *) gconf;

!                     if (isCommit && (conf->gen.status & GUC_HAVE_TENTATIVE))
                      {
!                         SET_STRING_SESSION_VAL(conf, conf->tentative_val);
!                         conf->gen.session_source = conf->gen.tentative_source;
!                         conf->tentative_val = NULL;        /* transfer ownership */
                      }
                      else
-                         SET_STRING_TENTATIVE_VAL(conf, NULL);
-
-                     if (*conf->variable != conf->session_val)
                      {
!                         char       *str = conf->session_val;

                          if (conf->assign_hook)
                          {
                              const char *newstr;

!                             newstr = (*conf->assign_hook) (str, true,
                                                             PGC_S_OVERRIDE);
                              if (newstr == NULL)
                                  elog(LOG, "failed to commit %s",
                                       conf->gen.name);
!                             else if (newstr != str)
                              {
                                  /*
                                   * See notes in set_config_option about
                                   * casting
                                   */
!                                 str = (char *) newstr;
!                                 SET_STRING_SESSION_VAL(conf, str);
                              }
                          }

!                         SET_STRING_VARIABLE(conf, str);
                          changed = true;
                      }
!                     conf->gen.source = conf->gen.session_source;
!                     conf->gen.status = 0;
                      break;
                  }
          }

          if (changed && (gconf->flags & GUC_REPORT))
              ReportGUCOption(gconf);
      }

!     guc_dirty = false;
  }


--- 2525,2714 ----
              case PGC_BOOL:
                  {
                      struct config_bool *conf = (struct config_bool *) gconf;
+                     bool        newval;
+                     GucSource    newsource;

!                     if (useTentative)
                      {
!                         newval = conf->tentative_val;
!                         newsource = conf->gen.tentative_source;
!                         conf->gen.status |= GUC_HAVE_TENTATIVE;
!                     }
!                     else
!                     {
!                         newval = stack->value.boolval;
!                         newsource = stack->source;
!                         conf->tentative_val = stack->tentative_val.boolval;
!                         conf->gen.tentative_source = stack->tentative_source;
                      }

!                     if (*conf->variable != newval)
                      {
                          if (conf->assign_hook)
!                             if (!(*conf->assign_hook) (newval,
                                                         true, PGC_S_OVERRIDE))
                                  elog(LOG, "failed to commit %s",
                                       conf->gen.name);
!                         *conf->variable = newval;
                          changed = true;
                      }
!                     conf->gen.source = newsource;
                      break;
                  }
              case PGC_INT:
                  {
                      struct config_int *conf = (struct config_int *) gconf;
+                     int            newval;
+                     GucSource    newsource;

!                     if (useTentative)
                      {
!                         newval = conf->tentative_val;
!                         newsource = conf->gen.tentative_source;
!                         conf->gen.status |= GUC_HAVE_TENTATIVE;
!                     }
!                     else
!                     {
!                         newval = stack->value.intval;
!                         newsource = stack->source;
!                         conf->tentative_val = stack->tentative_val.intval;
!                         conf->gen.tentative_source = stack->tentative_source;
                      }

!                     if (*conf->variable != newval)
                      {
                          if (conf->assign_hook)
!                             if (!(*conf->assign_hook) (newval,
                                                         true, PGC_S_OVERRIDE))
                                  elog(LOG, "failed to commit %s",
                                       conf->gen.name);
!                         *conf->variable = newval;
                          changed = true;
                      }
!                     conf->gen.source = newsource;
                      break;
                  }
              case PGC_REAL:
                  {
                      struct config_real *conf = (struct config_real *) gconf;
+                     double        newval;
+                     GucSource    newsource;

!                     if (useTentative)
                      {
!                         newval = conf->tentative_val;
!                         newsource = conf->gen.tentative_source;
!                         conf->gen.status |= GUC_HAVE_TENTATIVE;
!                     }
!                     else
!                     {
!                         newval = stack->value.realval;
!                         newsource = stack->source;
!                         conf->tentative_val = stack->tentative_val.realval;
!                         conf->gen.tentative_source = stack->tentative_source;
                      }

!                     if (*conf->variable != newval)
                      {
                          if (conf->assign_hook)
!                             if (!(*conf->assign_hook) (newval,
                                                         true, PGC_S_OVERRIDE))
                                  elog(LOG, "failed to commit %s",
                                       conf->gen.name);
!                         *conf->variable = newval;
                          changed = true;
                      }
!                     conf->gen.source = newsource;
                      break;
                  }
              case PGC_STRING:
                  {
                      struct config_string *conf = (struct config_string *) gconf;
+                     char       *newval;
+                     GucSource    newsource;

!                     if (useTentative)
                      {
!                         newval = conf->tentative_val;
!                         newsource = conf->gen.tentative_source;
!                         conf->gen.status |= GUC_HAVE_TENTATIVE;
                      }
                      else
                      {
!                         newval = stack->value.stringval;
!                         newsource = stack->source;
!                         set_string_field(conf, &conf->tentative_val,
!                                          stack->tentative_val.stringval);
!                         conf->gen.tentative_source = stack->tentative_source;
!                     }

+                     if (*conf->variable != newval)
+                     {
                          if (conf->assign_hook)
                          {
                              const char *newstr;

!                             newstr = (*conf->assign_hook) (newval, true,
                                                             PGC_S_OVERRIDE);
                              if (newstr == NULL)
                                  elog(LOG, "failed to commit %s",
                                       conf->gen.name);
!                             else if (newstr != newval)
                              {
                                  /*
+                                  * If newval should now be freed, it'll be
+                                  * taken care of below.
+                                  *
                                   * See notes in set_config_option about
                                   * casting
                                   */
!                                 newval = (char *) newstr;
                              }
                          }

!                         set_string_field(conf, conf->variable, newval);
                          changed = true;
                      }
!                     conf->gen.source = newsource;
!                     /* Release stacked values if not used anymore */
!                     set_string_field(conf, &stack->value.stringval,
!                                      NULL);
!                     set_string_field(conf, &stack->tentative_val.stringval,
!                                      NULL);
!                     /* Don't store tentative value separately after commit */
!                     if (!isSubXact)
!                         set_string_field(conf, &conf->tentative_val, NULL);
                      break;
                  }
          }

+         /* Finish popping the state stack */
+         gconf->stack = stack->prev;
+         pfree(stack);
+
+         /*
+          * If we're now out of all xact levels, forget TENTATIVE status bit;
+          * there's nothing tentative about the value anymore.
+          */
+         if (!isSubXact)
+         {
+             Assert(gconf->stack == NULL);
+             gconf->status = 0;
+         }
+
+         /* Report new value if we changed it */
          if (changed && (gconf->flags & GUC_REPORT))
              ReportGUCOption(gconf);
      }

!     /*
!      * If we're now out of all xact levels, we can clear guc_dirty.
!      * (Note: we cannot reset guc_dirty when exiting a subtransaction,
!      * because we know that all outer transaction levels will have stacked
!      * values to deal with.)
!      */
!     if (!isSubXact)
!         guc_dirty = false;
  }


***************
*** 2810,2816 ****
      }

      /*
!      * Should we set reset/session values?    (If so, the behavior is not
       * transactional.)
       */
      makeDefault = changeVal && (source <= PGC_S_OVERRIDE) && (value != NULL);
--- 3012,3018 ----
      }

      /*
!      * Should we set reset/stacked values?    (If so, the behavior is not
       * transactional.)
       */
      makeDefault = changeVal && (source <= PGC_S_OVERRIDE) && (value != NULL);
***************
*** 2820,2826 ****
       * However, if changeVal is false then plow ahead anyway since we are
       * trying to find out if the value is potentially good, not actually
       * use it. Also keep going if makeDefault is true, since we may want
!      * to set the reset/session values even if we can't set the variable
       * itself.
       */
      if (record->source > source)
--- 3022,3028 ----
       * However, if changeVal is false then plow ahead anyway since we are
       * trying to find out if the value is potentially good, not actually
       * use it. Also keep going if makeDefault is true, since we may want
!      * to set the reset/stacked values even if we can't set the variable
       * itself.
       */
      if (record->source > source)
***************
*** 2901,2906 ****
--- 3103,3111 ----

                  if (changeVal || makeDefault)
                  {
+                     /* Save old value to support transaction abort */
+                     if (!makeDefault)
+                         push_old_value(&conf->gen);
                      if (changeVal)
                      {
                          *conf->variable = newval;
***************
*** 2908,2922 ****
                      }
                      if (makeDefault)
                      {
                          if (conf->gen.reset_source <= source)
                          {
                              conf->reset_val = newval;
                              conf->gen.reset_source = source;
                          }
!                         if (conf->gen.session_source <= source)
                          {
!                             conf->session_val = newval;
!                             conf->gen.session_source = source;
                          }
                      }
                      else if (isLocal)
--- 3113,3132 ----
                      }
                      if (makeDefault)
                      {
+                         GucStack *stack;
+
                          if (conf->gen.reset_source <= source)
                          {
                              conf->reset_val = newval;
                              conf->gen.reset_source = source;
                          }
!                         for (stack = conf->gen.stack; stack; stack = stack->prev)
                          {
!                             if (stack->source <= source)
!                             {
!                                 stack->value.boolval = newval;
!                                 stack->source = source;
!                             }
                          }
                      }
                      else if (isLocal)
***************
*** 3006,3011 ****
--- 3216,3224 ----

                  if (changeVal || makeDefault)
                  {
+                     /* Save old value to support transaction abort */
+                     if (!makeDefault)
+                         push_old_value(&conf->gen);
                      if (changeVal)
                      {
                          *conf->variable = newval;
***************
*** 3013,3027 ****
                      }
                      if (makeDefault)
                      {
                          if (conf->gen.reset_source <= source)
                          {
                              conf->reset_val = newval;
                              conf->gen.reset_source = source;
                          }
!                         if (conf->gen.session_source <= source)
                          {
!                             conf->session_val = newval;
!                             conf->gen.session_source = source;
                          }
                      }
                      else if (isLocal)
--- 3226,3245 ----
                      }
                      if (makeDefault)
                      {
+                         GucStack *stack;
+
                          if (conf->gen.reset_source <= source)
                          {
                              conf->reset_val = newval;
                              conf->gen.reset_source = source;
                          }
!                         for (stack = conf->gen.stack; stack; stack = stack->prev)
                          {
!                             if (stack->source <= source)
!                             {
!                                 stack->value.intval = newval;
!                                 stack->source = source;
!                             }
                          }
                      }
                      else if (isLocal)
***************
*** 3101,3106 ****
--- 3319,3327 ----

                  if (changeVal || makeDefault)
                  {
+                     /* Save old value to support transaction abort */
+                     if (!makeDefault)
+                         push_old_value(&conf->gen);
                      if (changeVal)
                      {
                          *conf->variable = newval;
***************
*** 3108,3122 ****
                      }
                      if (makeDefault)
                      {
                          if (conf->gen.reset_source <= source)
                          {
                              conf->reset_val = newval;
                              conf->gen.reset_source = source;
                          }
!                         if (conf->gen.session_source <= source)
                          {
!                             conf->session_val = newval;
!                             conf->gen.session_source = source;
                          }
                      }
                      else if (isLocal)
--- 3329,3348 ----
                      }
                      if (makeDefault)
                      {
+                         GucStack *stack;
+
                          if (conf->gen.reset_source <= source)
                          {
                              conf->reset_val = newval;
                              conf->gen.reset_source = source;
                          }
!                         for (stack = conf->gen.stack; stack; stack = stack->prev)
                          {
!                             if (stack->source <= source)
!                             {
!                                 stack->value.realval = newval;
!                                 stack->source = source;
!                             }
                          }
                      }
                      else if (isLocal)
***************
*** 3261,3287 ****

                  if (changeVal || makeDefault)
                  {
                      if (changeVal)
                      {
!                         SET_STRING_VARIABLE(conf, newval);
                          conf->gen.source = source;
                      }
                      if (makeDefault)
                      {
                          if (conf->gen.reset_source <= source)
                          {
!                             SET_STRING_RESET_VAL(conf, newval);
                              conf->gen.reset_source = source;
                          }
!                         if (conf->gen.session_source <= source)
                          {
!                             SET_STRING_SESSION_VAL(conf, newval);
!                             conf->gen.session_source = source;
                          }
                          /* Perhaps we didn't install newval anywhere */
!                         if (newval != *conf->variable &&
!                             newval != conf->session_val &&
!                             newval != conf->reset_val)
                              free(newval);
                      }
                      else if (isLocal)
--- 3487,3520 ----

                  if (changeVal || makeDefault)
                  {
+                     /* Save old value to support transaction abort */
+                     if (!makeDefault)
+                         push_old_value(&conf->gen);
                      if (changeVal)
                      {
!                         set_string_field(conf, conf->variable, newval);
                          conf->gen.source = source;
                      }
                      if (makeDefault)
                      {
+                         GucStack *stack;
+
                          if (conf->gen.reset_source <= source)
                          {
!                             set_string_field(conf, &conf->reset_val, newval);
                              conf->gen.reset_source = source;
                          }
!                         for (stack = conf->gen.stack; stack; stack = stack->prev)
                          {
!                             if (stack->source <= source)
!                             {
!                                 set_string_field(conf, &stack->value.stringval,
!                                                  newval);
!                                 stack->source = source;
!                             }
                          }
                          /* Perhaps we didn't install newval anywhere */
!                         if (!string_field_used(conf, newval))
                              free(newval);
                      }
                      else if (isLocal)
***************
*** 3291,3297 ****
                      }
                      else
                      {
!                         SET_STRING_TENTATIVE_VAL(conf, newval);
                          conf->gen.tentative_source = source;
                          conf->gen.status |= GUC_HAVE_TENTATIVE;
                          guc_dirty = true;
--- 3524,3530 ----
                      }
                      else
                      {
!                         set_string_field(conf, &conf->tentative_val, newval);
                          conf->gen.tentative_source = source;
                          conf->gen.status |= GUC_HAVE_TENTATIVE;
                          guc_dirty = true;
***************
*** 3608,3651 ****
      /* This better be a placeholder
       */
      if(((*res)->flags & GUC_CUSTOM_PLACEHOLDER) == 0)
-     {
          ereport(ERROR,
                  (errcode(ERRCODE_INTERNAL_ERROR),
                   errmsg("attempt to redefine parameter \"%s\"", name)));
!     }
!     pHolder = (struct config_string*)*res;

!     /* We have the same name, no sorting is necessary.
!      */
      *res = variable;

      value = *pHolder->variable;

!     /* Assign the variable stored in the placeholder to the real
!      * variable.
       */
      set_config_option(name, value,
                    pHolder->gen.context, pHolder->gen.source,
                    false, true);

!     /* Free up stuff occupied by the placeholder variable
       */
!     if(value != NULL)
!         free((void*)value);
!
!     if(pHolder->reset_val != NULL && pHolder->reset_val != value)
!         free(pHolder->reset_val);
!
!     if(pHolder->session_val != NULL
!     && pHolder->session_val != value
!     && pHolder->session_val != pHolder->reset_val)
!         free(pHolder->session_val);
!
!     if(pHolder->tentative_val != NULL
!     && pHolder->tentative_val != value
!     && pHolder->tentative_val != pHolder->reset_val
!     && pHolder->tentative_val != pHolder->session_val)
!         free(pHolder->tentative_val);

      free(pHolder);
  }
--- 3841,3876 ----
      /* This better be a placeholder
       */
      if(((*res)->flags & GUC_CUSTOM_PLACEHOLDER) == 0)
          ereport(ERROR,
                  (errcode(ERRCODE_INTERNAL_ERROR),
                   errmsg("attempt to redefine parameter \"%s\"", name)));
!
!     Assert((*res)->vartype == PGC_STRING);
!     pHolder = (struct config_string*) *res;

!     /* We have the same name, no sorting is necessary */
      *res = variable;

      value = *pHolder->variable;

!     /*
!      * Assign the string value stored in the placeholder to the real variable.
!      *
!      * XXX this is not really good enough --- it should be a nontransactional
!      * assignment, since we don't want it to roll back if the current xact
!      * fails later.
       */
      set_config_option(name, value,
                    pHolder->gen.context, pHolder->gen.source,
                    false, true);

!     /*
!      * Free up as much as we conveniently can of the placeholder structure
!      * (this neglects any stack items...)
       */
!     set_string_field(pHolder, pHolder->variable, NULL);
!     set_string_field(pHolder, &pHolder->reset_val, NULL);
!     set_string_field(pHolder, &pHolder->tentative_val, NULL);

      free(pHolder);
  }
***************
*** 3754,3760 ****
      define_custom_variable(&var->gen);
  }

! extern void EmittWarningsOnPlaceholders(const char* className)
  {
      struct config_generic** vars = guc_variables;
      struct config_generic** last = vars + num_guc_variables;
--- 3979,3985 ----
      define_custom_variable(&var->gen);
  }

! extern void EmitWarningsOnPlaceholders(const char* className)
  {
      struct config_generic** vars = guc_variables;
      struct config_generic** last = vars + num_guc_variables;
***************
*** 5133,5137 ****
--- 5358,5371 ----
      return true;
  }

+ static bool
+ assign_transaction_read_only(bool newval, bool doit, GucSource source)
+ {
+     if (doit && source >= PGC_S_INTERACTIVE && IsSubTransaction())
+         ereport(ERROR,
+                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                  errmsg("cannot set transaction read only mode inside a subtransaction")));
+     return true;
+ }

  #include "guc-file.c"
*** src/include/utils/guc.h.orig    Fri May 28 01:13:32 2004
--- src/include/utils/guc.h    Tue Jun 29 13:56:33 2004
***************
*** 175,188 ****
      GucStringAssignHook assign_hook,
      GucShowHook show_hook);

! extern void EmittWarningsOnPlaceholders(const char* className);

  extern const char *GetConfigOption(const char *name);
  extern const char *GetConfigOptionResetString(const char *name);
  extern void ProcessConfigFile(GucContext context);
  extern void InitializeGUCOptions(void);
  extern void ResetAllOptions(void);
! extern void AtEOXact_GUC(bool isCommit);
  extern void BeginReportingGUCOptions(void);
  extern void ParseLongOption(const char *string, char **name, char **value);
  extern bool set_config_option(const char *name, const char *value,
--- 175,188 ----
      GucStringAssignHook assign_hook,
      GucShowHook show_hook);

! extern void EmitWarningsOnPlaceholders(const char* className);

  extern const char *GetConfigOption(const char *name);
  extern const char *GetConfigOptionResetString(const char *name);
  extern void ProcessConfigFile(GucContext context);
  extern void InitializeGUCOptions(void);
  extern void ResetAllOptions(void);
! extern void AtEOXact_GUC(bool isCommit, bool isSubXact);
  extern void BeginReportingGUCOptions(void);
  extern void ParseLongOption(const char *string, char **name, char **value);
  extern bool set_config_option(const char *name, const char *value,
*** src/include/utils/guc_tables.h.orig    Wed May 26 12:50:34 2004
--- src/include/utils/guc_tables.h    Tue Jun 29 13:56:33 2004
***************
*** 11,18 ****
   *
   *-------------------------------------------------------------------------
   */
! #ifndef GUC_TABLES
! #define GUC_TABLES 1

  /*
   * Groupings to help organize all the run-time options for display
--- 11,37 ----
   *
   *-------------------------------------------------------------------------
   */
! #ifndef GUC_TABLES_H
! #define GUC_TABLES_H 1
!
! /*
!  * GUC supports these types of variables:
!  */
! enum config_type
! {
!     PGC_BOOL,
!     PGC_INT,
!     PGC_REAL,
!     PGC_STRING
! };
!
! union config_var_value
! {
!     bool        boolval;
!     int            intval;
!     double        realval;
!     char       *stringval;
! };

  /*
   * Groupings to help organize all the run-time options for display
***************
*** 56,70 ****
  };

  /*
!  * GUC supports these types of variables:
   */
! enum config_type
  {
!     PGC_BOOL,
!     PGC_INT,
!     PGC_REAL,
!     PGC_STRING
! };

  /*
   * Generic fields applicable to all types of variables
--- 75,93 ----
  };

  /*
!  * Stack entry for saving the state of a variable prior to the current
!  * transaction
   */
! typedef struct guc_stack
  {
!     struct guc_stack *prev;        /* previous stack item, if any */
!     int            nest_level;        /* nesting depth of cur transaction */
!     int            status;            /* previous status bits, see below */
!     GucSource    tentative_source;        /* source of the tentative_value */
!     GucSource    source;            /* source of the actual value */
!     union config_var_value tentative_val;    /* previous tentative val */
!     union config_var_value value;            /* previous actual value */
! } GucStack;

  /*
   * Generic fields applicable to all types of variables
***************
*** 86,94 ****
      enum config_type vartype;    /* type of variable (set only at startup) */
      int            status;            /* status bits, see below */
      GucSource    reset_source;    /* source of the reset_value */
-     GucSource    session_source; /* source of the session_value */
      GucSource    tentative_source;        /* source of the tentative_value */
      GucSource    source;            /* source of the current actual value */
  };

  /* bit values in flags field */
--- 109,117 ----
      enum config_type vartype;    /* type of variable (set only at startup) */
      int            status;            /* status bits, see below */
      GucSource    reset_source;    /* source of the reset_value */
      GucSource    tentative_source;        /* source of the tentative_value */
      GucSource    source;            /* source of the current actual value */
+     GucStack   *stack;            /* stacked outside-of-transaction states */
  };

  /* bit values in flags field */
***************
*** 104,109 ****
--- 127,133 ----
  /* bit values in status field */
  #define GUC_HAVE_TENTATIVE    0x0001        /* tentative value is defined */
  #define GUC_HAVE_LOCAL        0x0002        /* a SET LOCAL has been executed */
+ #define GUC_HAVE_STACK        0x0004        /* we have stacked prior value(s) */


  /* GUC records for specific variable types */
***************
*** 118,124 ****
      GucBoolAssignHook assign_hook;
      GucShowHook show_hook;
      /* variable fields, initialized at runtime: */
-     bool        session_val;
      bool        tentative_val;
  };

--- 142,147 ----
***************
*** 134,140 ****
      GucIntAssignHook assign_hook;
      GucShowHook show_hook;
      /* variable fields, initialized at runtime: */
-     int            session_val;
      int            tentative_val;
  };

--- 157,162 ----
***************
*** 150,156 ****
      GucRealAssignHook assign_hook;
      GucShowHook show_hook;
      /* variable fields, initialized at runtime: */
-     double        session_val;
      double        tentative_val;
  };

--- 172,177 ----
***************
*** 165,171 ****
      GucShowHook show_hook;
      /* variable fields, initialized at runtime: */
      char       *reset_val;
-     char       *session_val;
      char       *tentative_val;
  };

--- 186,191 ----
***************
*** 180,183 ****

  extern void build_guc_variables(void);

! #endif
--- 200,203 ----

  extern void build_guc_variables(void);

! #endif /* GUC_TABLES_H */

Re: nested xacts and phantom Xids

From
Alvaro Herrera
Date:
On Sun, Jun 20, 2004 at 08:49:22PM -0400, Tom Lane wrote:

> There's a good deal more than that missing :-(.  Here are the modules or
> actions that are called in CommitTransaction and/or AbortTransaction
> that have not yet been touched by the patch:
>
> localbuf.c (refcounts need fixed same as bufmgr)

Here is a patch against the original versions of these files; cleaned up
bufmgr.c somewhat.  Adds the same logic to local buffers (moving the
BufferRefCount struct declaration to buf_internals.h so it's shared by
both bufmgr.c and localbuf.c).  Needs xact.c and xact.h patched as in
the second patch.

As with the bufmgr.c original patch, I don't really know how to test
that this actually works.  I fooled around with printing what it was
doing during a subtrans commit/abort, and it seems OK, but that's about
it.  In what situations can a transaction roll back with a nonzero
reference count in a local buffer?

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"I dream about dreams about dreams", sang the nightingale
under the pale moon (Sandman)

Attachment

Re: nested xacts and phantom Xids

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> As with the bufmgr.c original patch, I don't really know how to test
> that this actually works.  I fooled around with printing what it was
> doing during a subtrans commit/abort, and it seems OK, but that's about
> it.  In what situations can a transaction roll back with a nonzero
> reference count in a local buffer?

You need an active cursor, eg

    begin;
    declare c cursor for select * from tenk1;
    fetch 1 in c;
    ... now you've got an open buffer refcount to some page of tenk1

I forgot to mention to you that that code didn't work at all, btw.
I have fixed some of the problems in my local version but there's still
a fairly large issue, which is what exactly we think the semantics of a
cursor declared in a subtransaction ought to be.  With bufmgr set up to
consider open reference counts as a bug, we cannot hold such a cursor
open past subtrans commit.

One possible approach is to consider subxact commit the same as main
xact commit as far as cursors are concerned: materialize anything
declared WITH HOLD, close anything declared without.

The other theory we could adopt is that cursors stay open till main xact
commit; this would imply not releasing buffer refcounts at subxact
commit, plus any other resources needed by the cursor.  We're already
holding locks that way and it probably wouldn't be a big change to make
bufmgr work the same.  I'm not sure that there are any other resources
involved, other than the Portal memory which we already handle properly.

The first approach is a lower-risk path; I'm not sure if the second one
might have some hidden gotchas.  It seems like the second one would be
more flexible though.  Any opinions which to pursue?

Oh, there's another point: what happens if an outer xact level declares
a cursor, which is then FETCHed from by a subtransaction?  At minimum we
have the problem that this could change the set of buffer pins held,
which breaks the present bufmgr solution entirely.  It gets even more
interesting if you are of the opinion that subtransaction failure should
cause the effects of the FETCH to be undone --- we have no way to do
that at all, because there's no mechanism for saving/restoring the state
of an entire execution plan tree.  We might have to prohibit
subtransactions from touching outer-level cursors, at least for 7.5.
This would in turn make it a bit questionable whether there's any point
in letting cursors propagate up out of subtransactions...

            regards, tom lane

Re: nested xacts and phantom Xids

From
Tom Lane
Date:
BTW, here is what I'm working with just at the moment.

            regards, tom lane


Attachment

Re: nested xacts and phantom Xids

From
Alvaro Herrera
Date:
On Tue, Jun 29, 2004 at 06:59:20PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > As with the bufmgr.c original patch, I don't really know how to test
> > that this actually works.  [...]
>
> I forgot to mention to you that that code didn't work at all, btw.

Bad news, I guess.

> The other theory we could adopt is that cursors stay open till main xact
> commit; this would imply not releasing buffer refcounts at subxact
> commit, plus any other resources needed by the cursor.  We're already
> holding locks that way and it probably wouldn't be a big change to make
> bufmgr work the same.  I'm not sure that there are any other resources
> involved, other than the Portal memory which we already handle properly.

Well, AFAIR originally I had thought that refcounts should be held at
subtrans commit; you suggested that there was no reason for a subtrans
to keep a buffer refcount and that was it.  I think the open cursor is a
good reason why the count should be kept; it appears less useful if you
can't use the cursor anywhere out of the level that created it.

> Oh, there's another point: what happens if an outer xact level declares
> a cursor, which is then FETCHed from by a subtransaction?  At minimum we
> have the problem that this could change the set of buffer pins held,
> which breaks the present bufmgr solution entirely.  It gets even more
> interesting if you are of the opinion that subtransaction failure should
> cause the effects of the FETCH to be undone --- we have no way to do
> that at all, because there's no mechanism for saving/restoring the state
> of an entire execution plan tree.

Hmm ... yes, this could be very ugly indeed, but I haven't even looked
at the executor code so I can't comment.  Are executor nodes copyable?

Oh, and I've been playing with large objects and I've encountered bugs
elsewhere.  I'll look at it with the new patch you just posted.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Vivir y dejar de vivir son soluciones imaginarias.
La existencia está en otra parte" (Andre Breton)


Re: nested xacts and phantom Xids

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> Hmm ... yes, this could be very ugly indeed, but I haven't even looked
> at the executor code so I can't comment.  Are executor nodes copyable?

Nope, and even if we had support for that the executor tree per se
is just the tip of the iceberg.  There's also indexscan status, SRF
function internal state, yadda yadda.  I think the odds of doing
something with all that stuff for 7.5 are exactly zero ... we'd better
define a stopgap behavior.

> Oh, and I've been playing with large objects and I've encountered bugs
> elsewhere.  I'll look at it with the new patch you just posted.

Wouldn't surprise me, we've not looked at that yet either.

I do feel that we have enough things working that we should commit to
nested transactions for 7.5.  There will be some things that we have to
restrict, such as cursors and perhaps large objects.  But it's surely
better than no subtransactions at all.

            regards, tom lane

Re: nested xacts and phantom Xids

From
Bruce Momjian
Date:
Added to TODO, just so we don't forget later:

    * Use a phantom command counter for nested subtransactions to reduce
          tuple overhead


---------------------------------------------------------------------------

Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > Hmm ... yes, this could be very ugly indeed, but I haven't even looked
> > at the executor code so I can't comment.  Are executor nodes copyable?
>
> Nope, and even if we had support for that the executor tree per se
> is just the tip of the iceberg.  There's also indexscan status, SRF
> function internal state, yadda yadda.  I think the odds of doing
> something with all that stuff for 7.5 are exactly zero ... we'd better
> define a stopgap behavior.
>
> > Oh, and I've been playing with large objects and I've encountered bugs
> > elsewhere.  I'll look at it with the new patch you just posted.
>
> Wouldn't surprise me, we've not looked at that yet either.
>
> I do feel that we have enough things working that we should commit to
> nested transactions for 7.5.  There will be some things that we have to
> restrict, such as cursors and perhaps large objects.  But it's surely
> better than no subtransactions at all.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073