Thread: nested xacts and phantom Xids
Ok people, Here I present the nested transactions patch and the phantom Xids patch that goes with it. Please review, test and comment. Missing: rollback of SET CONSTRAINTS and GUC vars. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) Y una voz del caos me habló y me dijo "Sonríe y sé feliz, podría ser peor". Y sonreí. Y fui feliz. Y fue peor.
Attachment
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Here I present the nested transactions patch and the phantom Xids patch > that goes with it. I looked at the phantom XIDs stuff a bit. I still have little confidence that the concept is correct :-( but here are some comments on the code level. > + * They are marked immediately in pg_subtrans as direct childs of the topmost > + * Xid (no matter how deep we are in the transaction tree), Why is that a good idea --- won't it cause problems when you commit/abort an intermediate level of subtransaction? > + * All this happens when HeapTupleHeaderSetXmax is called and the Xmin of the > + * tuple is one of the Xids of the current transaction. > + * > + * Note that HeapTupleHeaderGetXmax/GetXmin return the masqueraded Xmin and > + * Xmax, not the real one in the tuple header. I think that putting this sort of logic into Set/Get Xmin/Xmax is a horrid idea. They are supposed to be transparent fetch/store macros, not arbitrarily-complex filter functions. They're also supposed to be reasonably fast :-( > + * ... We know to do this when SetXmax is called > + * on a tuple that has the HEAP_MARKED_FOR_UPDATE bit set. Uglier and uglier. > *************** > *** 267,274 **** > return true; > > /* deleting subtransaction aborted */ > ! if (TransactionIdDidAbort(HeapTupleHeaderGetXmax(tuple))) > ! return true; > > Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple))); > > --- 268,276 ---- > return true; > > /* deleting subtransaction aborted */ > ! if (tuple->t_infomask & HEAP_XMIN_IS_PHANTOM) > ! if (TransactionPhantomIsCommittedPhantom(HeapTupleHeaderGetPhantomId(tuple))) > ! return true; > > Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple))); > Er, what happened to checking for the "deleting subtransaction aborted" case? On the whole, I think this was an interesting exercise but also an utter failure. We'd be far better off to eat the extra 4 bytes per tuple header and put back the separate Xmax field. The costs in both runtime and complexity/reliability of this patch are simply not justified by a 4-byte savings. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > Here I present the nested transactions patch and the phantom Xids patch > > that goes with it. > > I looked at the phantom XIDs stuff a bit. I still have little confidence > that the concept is correct :-( but here are some comments on the code > level. > > > + * They are marked immediately in pg_subtrans as direct childs of the topmost > > + * Xid (no matter how deep we are in the transaction tree), > > Why is that a good idea --- won't it cause problems when you > commit/abort an intermediate level of subtransaction? I don't think so. The phantom xid is used only by the outside transaction waiting for that tuple to be committe or aborted. The outside tranaction will sleep on the topmost xid completing, then check the phantom xid status for commit/abort. Within the transaction, I think he uses command counter to know the creation and destruction sub-xid. I think right now phantom xids are used only for making parts of a subtransaction committed or aborted and only in cases where the tuple is created and destroyed by parts of the same transaction tree. I don't feel too bad about the runtime cost if only subtransactions are paying that cost. I know we are really stretching the system here but I would like to try a little more rather than give up and taking a space hit for all tuples. --------------------------------------------------------------------------- > > + * All this happens when HeapTupleHeaderSetXmax is called and the Xmin of the > > + * tuple is one of the Xids of the current transaction. > > + * > > + * Note that HeapTupleHeaderGetXmax/GetXmin return the masqueraded Xmin and > > + * Xmax, not the real one in the tuple header. > > I think that putting this sort of logic into Set/Get Xmin/Xmax is a > horrid idea. They are supposed to be transparent fetch/store macros, > not arbitrarily-complex filter functions. They're also supposed to > be reasonably fast :-( > > > + * ... We know to do this when SetXmax is called > > + * on a tuple that has the HEAP_MARKED_FOR_UPDATE bit set. > > Uglier and uglier. > > > *************** > > *** 267,274 **** > > return true; > > > > /* deleting subtransaction aborted */ > > ! if (TransactionIdDidAbort(HeapTupleHeaderGetXmax(tuple))) > > ! return true; > > > > Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple))); > > > > --- 268,276 ---- > > return true; > > > > /* deleting subtransaction aborted */ > > ! if (tuple->t_infomask & HEAP_XMIN_IS_PHANTOM) > > ! if (TransactionPhantomIsCommittedPhantom(HeapTupleHeaderGetPhantomId(tuple))) > > ! return true; > > > > Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple))); > > > > Er, what happened to checking for the "deleting subtransaction aborted" > case? > > On the whole, I think this was an interesting exercise but also an utter > failure. We'd be far better off to eat the extra 4 bytes per tuple > header and put back the separate Xmax field. The costs in both runtime > and complexity/reliability of this patch are simply not justified by > a 4-byte savings. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian wrote: > Tom Lane wrote: > > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > > Here I present the nested transactions patch and the phantom Xids patch > > > that goes with it. > > > > I looked at the phantom XIDs stuff a bit. I still have little confidence > > that the concept is correct :-( but here are some comments on the code > > level. > > > > > + * They are marked immediately in pg_subtrans as direct childs of the topmost > > > + * Xid (no matter how deep we are in the transaction tree), > > > > Why is that a good idea --- won't it cause problems when you > > commit/abort an intermediate level of subtransaction? > > I don't think so. The phantom xid is used only by the outside > transaction waiting for that tuple to be committe or aborted. The > outside tranaction will sleep on the topmost xid completing, then check > the phantom xid status for commit/abort. Within the transaction, I think > he uses command counter to know the creation and destruction sub-xid. > > I think right now phantom xids are used only for making parts of a > subtransaction committed or aborted and only in cases where the tuple is > created and destroyed by parts of the same transaction tree. > > I don't feel too bad about the runtime cost if only subtransactions are > paying that cost. I know we are really stretching the system here but I > would like to try a little more rather than give up and taking a space > hit for all tuples. Let me add that outside transactions read those phantom xid only when they are doing dirty reads. What I don't understand is when do outside transactions see tuples created inside a transaction? INSERT into a table with a unique key? Once the main transaction commits, these phantom tuples should work just like ordinary tuples except they get their cmin overwritten when they are expired, I think. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I don't feel too bad about the runtime cost if only subtransactions are > paying that cost. That's exactly why I'm so exercised about what's been done to the HeapTupleSet/Get macros. That's significant cost that's paid even when you're not using *any* of this stuff. > I know we are really stretching the system here but I > would like to try a little more rather than give up and taking a space > hit for all tuples. I don't even have any confidence that there are no fundamental bugs in the phantom-xid concept :-(. I'd be willing to play along if an implementation that seemed acceptable speedwise were being offered, but this thing is not preferable to four-more-bytes even if it works. regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Let me add that outside transactions read those phantom xid only when > they are doing dirty reads. What I don't understand is when do outside > transactions see tuples created inside a transaction? INSERT into a > table with a unique key? > Once the main transaction commits, these phantom tuples should work just > like ordinary tuples except they get their cmin overwritten when they > are expired, I think. You're not doing anything to increase my confidence level in this concept. You invented it, and even you aren't sure how it works. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Let me add that outside transactions read those phantom xid only when > > they are doing dirty reads. What I don't understand is when do outside > > transactions see tuples created inside a transaction? INSERT into a > > table with a unique key? > > > Once the main transaction commits, these phantom tuples should work just > > like ordinary tuples except they get their cmin overwritten when they > > are expired, I think. > > You're not doing anything to increase my confidence level in this > concept. You invented it, and even you aren't sure how it works. And your point is? I am trying to help Alvaro. I came up with an idea, and some thought it would help solve the problem. What ideas do you have to help him? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Here I present the nested transactions patch and the phantom Xids patch > that goes with it. I took a very preliminary look through the nested-xacts patch, too. > Missing: rollback of SET CONSTRAINTS and GUC vars. There's a good deal more than that missing :-(. Here are the modules or actions that are called in CommitTransaction and/or AbortTransaction that have not yet been touched by the patch: shared-inval message queue processing (inval.c) lo_commit localbuf.c (refcounts need fixed same as bufmgr) eoxactcallbacks API needs extension GUC namespace (temp namespace cleanup) files (close/drop temp files) reset current userid during xact abort password file updating SetReindexProcessing disable (can these be nested??) Of these the one that I think most urgently needs attention is inval.c; that is complicated code already and figuring out what should happen for subtrans commit/abort is not trivial. The others look relatively straightforward to fix, if tedious. Given patches for inval.c and guc.c, I would say that the patch is functionally close enough to done that we could commit to including it in 7.5 --- the other stuff could be wrapped up post-feature-freeze. BUT (you knew there was a but coming, right?) ... I am really really disturbed about the potential performance hit from this patch. I don't think we want to have a nested-transaction feature that imposes significant overhead on people who aren't actually using nested transactions. As I looked through the patch I found quite a few places that would impose such overhead. The main thing that's bothering me is that TransactionIdIsInProgress has been turned from a relatively cheap test (just scan the in-memory PGPROC array) into an exceedingly expensive one. It's gratutitously badly coded (for N open main transactions it does N searches of the subtrans tree, when one would do), but I don't really want it going to the subtrans tree at all during typical cases. We had talked about keeping a limited number of active subtrans IDs in the PGPROC array, and I think we're really going to need to do that to buy back performance here. (BTW, what is the intended behavior of TransactionIdIsInProgress for a committed or aborted subtransaction whose parent is still open? If it has to recognize aborted subtranses as still in progress then things get very messy, but I think it probably doesn't have to.) I'm quite unhappy with the wide use of SubTransXidsHaveCommonAncestor to replace TransactionIdEquals in tqual.c, too. This turns very cheap tests into very expensive ones --- whether or not you use subtransactions --- and unfortunately tqual.c is coded on the assumption that these tests are cheap. We can *not* afford to repeat SubTransXidsHaveCommonAncestor for every open main transaction every time we check a tuple. In many ways this is the same problem as with TransactionIdIsInProgress, and it probably needs a similar solution. Also I found it annoying that XactLockTableWait waits for a top xact even when target subxact has already aborted; possibly this is even a recipe for deadlock. Perhaps should iterate up the tree one level at a time, waiting for each? Would mean that subxacts have to acquire a lock on their xid that other people can wait for, but this is not necessarily bad. (In fact, if we do that then I don't think XactLockTableWait has to change at all.) So I think there are several must-fix performance issues as well before we can make a decision to accept this for 7.5. I'd feel better about this if we had another month, but with ten days before feature freeze it's gonna be touch and go. regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: > What ideas do you have to help him? Eat the four-byte overhead. Quite frankly, given the other functionality and performance issues he has to solve in the next ten days (see my other message just now), I think he'd be a fool to spend one more minute on phantom XIDs. Maybe for 7.6 someone will have the time to make the idea work. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > What ideas do you have to help him? > > Eat the four-byte overhead. > > Quite frankly, given the other functionality and performance issues > he has to solve in the next ten days (see my other message just now), > I think he'd be a fool to spend one more minute on phantom XIDs. > Maybe for 7.6 someone will have the time to make the idea work. Well, the problem with eating the 4-byte overhead is that it has a performance/storage impact for all installations, not just those that use nested transactions. You were discussing a performance impact for nested transactions, but I assume that is an impact when some on the server are using nested transactions and others are not, while this hit would be for all sites, even when no one is using nested transactions. However, given the list you just supplied, I agree Alvaro should focus on these items and add the 4-bytes to store needed values rather than continue to hack on phantom xids. We can certainly revisit that later, even in 7.6, or never. We can put Manfred on it. He did the compaction in the first place. :-) One point is that if we want to re-add those 4 bytes, we have to get community buy-in for it because it is a step backward for those who are not going to use nested transactions. (I don't want to go the compile-time switch route.) As far as cleaning up some of the items after feature freeze, well, I would like to avoid that, but it seems safe to do because the changes should be very localized, and because win32 will be cleaning up things during feature freeze too, I am sure. However, going into feature freeze knowing a features isn't 100% functional is something we try to avoid, so this would have to be a special exception based on the fact that the features is very difficult, and that Alvaro is working full-time on this. It does take us down the slippery slope of having the release process dictated by completion of nested transactions. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Sun, Jun 20, 2004 at 04:37:16PM -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > Here I present the nested transactions patch and the phantom Xids patch > > that goes with it. > > I looked at the phantom XIDs stuff a bit. I still have little confidence > that the concept is correct :-( but here are some comments on the code > level. Ok. I for one think this got much more complex than I had originally thought it would be. I agree the changes to Set/Get Xmin/Xmax are way beyond what one would want, but the alternative would be to spread the complexity into their callers and I think that would be much worse. I don't have a lot of confidence in this either. The patch will be available in archives if anybody wants to implement this in a cleaner and safer way; I'll continue working on the rest of the things you pointed out in the subtransactions patch. Sadly, something broke in a really strange way between my last cvs update and today's, because I can't get the patch to initdb. It compiles without warnings (and I did make distclean), but there's a weird bug I'll have to pursue. Regarding the invalidation messages, what I'm currently looking at is to add a TransactionId to each message, which will be CurrentTransactionId for each new message. When a subxact commits, all its messages are relabeled to its parent. When a subxact aborts, all messages labeled with its TransactionId are locally processed and discarded. This is tricky because chunks are prepended to the queue, but it also means we can stop processing as soon as a message belongs to another transaction. I think GUC vars are easier: when a variable is about to change value inside a subtransaction, save the previous value in a list from which it will be restored if the subtransaction later aborts. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Hay dos momentos en la vida de un hombre en los que no debería especular: cuando puede permitírselo y cuando no puede" (Mark Twain)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Regarding the invalidation messages, what I'm currently looking at is to > add a TransactionId to each message, which will be CurrentTransactionId > for each new message. When a subxact commits, all its messages are > relabeled to its parent. When a subxact aborts, all messages labeled > with its TransactionId are locally processed and discarded. Sounds plausible offhand. > This is tricky because chunks are prepended to the queue, but it also > means we can stop processing as soon as a message belongs to another > transaction. AFAIR there isn't any essential semantics to the ordering of the queue entries, so you can probably get away with reordering if that makes life any easier. Also, rather than labeling each entry individually, it might be better to keep a separate list for each level of transaction. Then instead of relabeling, you'd just concat the subtrans list to its parent's. Seems like this should be faster and less storage. regards, tom lane
On Mon, Jun 21, 2004 at 10:28:59PM -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > This is tricky because chunks are prepended to the queue, but it also > > means we can stop processing as soon as a message belongs to another > > transaction. > > AFAIR there isn't any essential semantics to the ordering of the queue > entries, so you can probably get away with reordering if that makes life > any easier. > > Also, rather than labeling each entry individually, it might be better > to keep a separate list for each level of transaction. Then instead of > relabeling, you'd just concat the subtrans list to its parent's. Seems > like this should be faster and less storage. Right, but it makes harder to detect when a duplicate relcache entry is going to be inserted. Judging from the commentary in the file this is an issue. Another idea I had was: 1. at subtransaction start, add a special "boundary" message carrying the new Xid. 2. at subtransaction abort, process the first chunk backwards until the boundary message with the corresponding Xid is found; if it's not, jump to the next and repeat. At subtransaction commit nothing special needs to be done. This avoids having to label each message and to change the message appending scheme. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "The eagle never lost so much time, as when he submitted to learn of the crow." (William Blake)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Mon, Jun 21, 2004 at 10:28:59PM -0400, Tom Lane wrote: >> Also, rather than labeling each entry individually, it might be better >> to keep a separate list for each level of transaction. Then instead of >> relabeling, you'd just concat the subtrans list to its parent's. Seems >> like this should be faster and less storage. > Right, but it makes harder to detect when a duplicate relcache entry is > going to be inserted. Judging from the commentary in the file this is > an issue. It's only a minor efficiency hack; don't get too tense about avoiding dups. (We don't bother to check for dup catcache-inval entries at all...) The only thing I'd suggest you need to preserve is the business about invalidating catcache before relcache; again that's just an efficiency hack, but it seems simple enough to be worth doing. regards, tom lane
On Sun, Jun 20, 2004 at 08:49:22PM -0400, Tom Lane wrote: > Of these the one that I think most urgently needs attention is inval.c; > that is complicated code already and figuring out what should happen > for subtrans commit/abort is not trivial. The others look relatively > straightforward to fix, if tedious. A patch for this is attached. It _seems_ to work ok; conceptually it seems ok too. I have done some testing which tends to indicate that it is right, but I'm not sure of all the implications this code has so I don't know how to test it exhaustively. Of course, regression tests pass, but this doesn't actually prove anything. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "¿Que diferencia tiene para los muertos, los huérfanos, y aquellos que han perdido su hogar, si la loca destrucción ha sido realizada bajo el nombre del totalitarismo o del santo nombre de la libertad y la democracia?" (Gandhi)
Attachment
Alvaro Herrera wrote: > On Sun, Jun 20, 2004 at 04:37:16PM -0400, Tom Lane wrote: > > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > > Here I present the nested transactions patch and the phantom Xids patch > > > that goes with it. > > > > I looked at the phantom XIDs stuff a bit. I still have little confidence > > that the concept is correct :-( but here are some comments on the code > > level. > > Ok. I for one think this got much more complex than I had originally > thought it would be. I agree the changes to Set/Get Xmin/Xmax are way > beyond what one would want, but the alternative would be to spread the > complexity into their callers and I think that would be much worse. > > I don't have a lot of confidence in this either. The patch will be > available in archives if anybody wants to implement this in a cleaner > and safer way; I'll continue working on the rest of the things you > pointed out in the subtransactions patch. I am sorry to have given Alvaro another idea that didn't work. However, thinking of options, I wonder if instead of phantom xids, we should do phantom cids. Because only the local backend looks at the command counter (cid). I think it might be alot cleaner. The phantom cid would have a tuple bit set indicating that instead of being a cid, it is an index into an array of cmin/cmax pairs. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
If we add nested transactions for 7.5, are we going to need savepoints too in the same release? If we don't, are we going to get a lot of complaints? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Wed, Jun 23, 2004 at 08:58:15AM -0400, Bruce Momjian wrote: > If we add nested transactions for 7.5, are we going to need savepoints > too in the same release? If we don't, are we going to get a lot of > complaints? It'd be good to have savepoints right now. I'm not sure it'd be good to expose the nested transactions implementation if we are going to offer savepoints later, because it means we will have to keep nested transactions forever. Maybe it is a good idea to hide the implementation details and offer only the standard SAVEPOINT feature. I'm not sure how hard it is to change the syntax; I don't think it'd be a big deal. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "La conclusión que podemos sacar de esos estudios es que no podemos sacar ninguna conclusión de ellos" (Tanenbaum)
On Wed, Jun 23, 2004 at 08:57:11AM -0400, Bruce Momjian wrote: > I am sorry to have given Alvaro another idea that didn't work. It allowed me to learn a lot of cool tricks, so it wasn't wasted work. The only think I'm sorry about is that I should have used the time for something more useful, like tackling the remaining problems in the nested xacts implementation proper. > However, thinking of options, I wonder if instead of phantom xids, we > should do phantom cids. Because only the local backend looks at the > command counter (cid). I think it might be alot cleaner. The phantom > cid would have a tuple bit set indicating that instead of being a cid, > it is an index into an array of cmin/cmax pairs. Yeah, maybe this can work. I'm not going to try however, at least not now. If somebody else wants to try, be my guest. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "El conflicto es el camino real hacia la unión"
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > It'd be good to have savepoints right now. I'm not sure it'd be good to > expose the nested transactions implementation if we are going to offer > savepoints later, because it means we will have to keep nested > transactions forever. Nested transactions are *good*. We should not hide them. I don't object to offering spec-compatible savepoints, but the plain fact of the matter is that that's a dumbed-down API. We should not feel constrained to offer only that. regards, tom lane
On Wed, 2004-06-23 at 13:57, Bruce Momjian wrote: > I am sorry to have given Alvaro another idea that didn't work. No way! Keep having the ideas, please. I've done some more digging in dead time on all of this and I think we're on the right course in general by implementing all of this. ...well done to Alvaro for being able to make the ideas reality. Best regards, Simon Riggs
On Sun, Jun 20, 2004 at 08:49:22PM -0400, Tom Lane wrote: Regarding GUC, a WIP report: > Given patches for inval.c and guc.c, I would say that the patch is > functionally close enough to done that we could commit to including > it in 7.5 --- the other stuff could be wrapped up post-feature-freeze. I figured I could save the values whenever they are going to change, and restore them if the subtransaction aborts. This seems to work fine (lightly tested). I still have to figure out how to handle allocation for string vars, but I thought I'd post the patch for others to see. Please let me know if it's too ugly. (This patch misses the pieces in xact.c and xact.h but I'm sure the concept is clear.) I'll post a full patch once the missing deferred trigger stuff works. With the patches I posted to inval.c I think this fulfills the requirements, barring the performance issues raised. Comments? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "No single strategy is always right (Unless the boss says so)" (Larry Wall)
Attachment
I said > I'll post a full patch once the missing deferred trigger stuff works. > With the patches I posted to inval.c I think this fulfills the > requirements, barring the performance issues raised. Ok, here is a full patch that includes: - all what was in the previous patch - GUC vars are rolled back on subxact abort - the inval msg lists are cleaned up of items created in the aborting subxact - SET CONSTRAINTS is rolled back too Missing: - all the other items Tom mentioned in his last mail, including performance enhancements. - discussion whether we want a different syntax for subxacts, like SUBBEGIN/SUBCOMMIT/SUBABORT instead of BEGIN/COMMIT/ABORT. Please comment on this point. - support for SAVEPOINT syntax. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "La grandeza es una experiencia transitoria. Nunca es consistente. Depende en gran parte de la imaginación humana creadora de mitos" (Irulan)
Attachment
Do we really need SubtransCutoffXid? AFAICS the reason for having it is only this claim in RecordTransactionCommit: * We can't mark committed subtransactions as fully committed, * because concurrent transactions would see them as committed * and not as in-progress. Leave them as "subcommitted" until * the parent transaction is below OldestXmin, per VACUUM. but I think this is dead wrong. As long as we mark the parent committed first, there is no race condition. tqual tests that are using snapshots will need to recognize that the subtransaction is a member of one of the snapshotted main XIDs, and those that are not will think committed is committed. So I want to mark subtransactions fully committed in RecordTransactionCommit, and lose SubtransCutoffXid. Comments? BTW, it would help to know what parts of the patch you intend to work on over the next couple of days. I'm reviewing and editorializing right now with intent to commit soon, so it would be good if we can avoid tromping on each others' feet. regards, tom lane
On Sat, Jun 26, 2004 at 07:56:09PM -0400, Tom Lane wrote: > Do we really need SubtransCutoffXid? AFAICS the reason for having it is > only this claim in RecordTransactionCommit: > > * We can't mark committed subtransactions as fully committed, > * because concurrent transactions would see them as committed > * and not as in-progress. Leave them as "subcommitted" until > * the parent transaction is below OldestXmin, per VACUUM. Right, that's the only point where it's used. I don't know clearly if some kind of mechanism will be needed to handle SUBTRANS COMMIT states in pg_clog that were left behind by a crashed subtransaction though. > but I think this is dead wrong. As long as we mark the parent committed > first, there is no race condition. tqual tests that are using snapshots > will need to recognize that the subtransaction is a member of one of the > snapshotted main XIDs, and those that are not will think committed is > committed. So I want to mark subtransactions fully committed in > RecordTransactionCommit, and lose SubtransCutoffXid. Comments? Yes, sounds good. > BTW, it would help to know what parts of the patch you intend to work on > over the next couple of days. I'm reviewing and editorializing right > now with intent to commit soon, so it would be good if we can avoid > tromping on each others' feet. This is really excellent news. I'll work on adding the Xid-cache to PGPROC and using that in TransactionIdIsInProgress and the tqual routines. If you want to work on that let me know and I'll handle things like the password file, local bufmgr refcount, etc. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) FOO MANE PADME HUM
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > I'll work on adding the Xid-cache to PGPROC and using that in > TransactionIdIsInProgress and the tqual routines. If you want to work > on that let me know and I'll handle things like the password file, local > bufmgr refcount, etc. Either one is okay, though doing the latter (ie modules you didn't touch yet) would be probably a bit easier to merge. regards, tom lane
On Sat, Jun 26, 2004 at 07:56:09PM -0400, Tom Lane wrote: > BTW, it would help to know what parts of the patch you intend to work on > over the next couple of days. I'm reviewing and editorializing right > now with intent to commit soon, so it would be good if we can avoid > tromping on each others' feet. Oops, I forgot that I had inadvertently left a kludge in commands/trigger.c. Please use this patch for this file instead. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) Jude: I wish humans laid eggs Ringlord: Why would you want humans to lay eggs? Jude: So I can eat them
Attachment
On Sun, Jun 27, 2004 at 12:49:10AM -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > I'll work on adding the Xid-cache to PGPROC and using that in > > TransactionIdIsInProgress and the tqual routines. If you want to work > > on that let me know and I'll handle things like the password file, local > > bufmgr refcount, etc. > > Either one is okay, though doing the latter (ie modules you didn't touch > yet) would be probably a bit easier to merge. Will do. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Un poeta es un mundo encerrado en un hombre" (Victor Hugo)
On Tue, Jun 29, 2004 at 03:22:52PM -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > - GUC vars are rolled back on subxact abort > > This did not work very well, but here is a revised GUC patch that I think > does work. It requires xact.c to export a function to report the > current nesting depth, and requires AtEOXact_GUC to be called in all > four cleanup paths (main and subxact commit and abort). Very cool, thank you. I had thought about doing something like this but in the end I got scared away of changing too much code here. > BTW, why do you have assign_transaction_read_only() in your patch? It > seems to me to be useful to create a readonly subxact of a read-write > outer transaction. Or is that just not-done-yet? Nah, it's a leftover from back when there wasn't any way to roll back GUC vars. I thought it should be handled similarly to the way the isolation level should be handled. With your patch I think it can be ripped away entirely. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) Officer Krupke, what are we to do? Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > - GUC vars are rolled back on subxact abort This did not work very well, but here is a revised GUC patch that I think does work. It requires xact.c to export a function to report the current nesting depth, and requires AtEOXact_GUC to be called in all four cleanup paths (main and subxact commit and abort). BTW, why do you have assign_transaction_read_only() in your patch? It seems to me to be useful to create a readonly subxact of a read-write outer transaction. Or is that just not-done-yet? regards, tom lane *** src/backend/utils/misc/README.orig Mon Jan 19 14:04:40 2004 --- src/backend/utils/misc/README Tue Jun 29 15:14:44 2004 *************** *** 68,116 **** would be effective had there never been any SET commands in the current session. ! To handle these cases we must keep track of as many as four distinct ! values for each variable. They are: * actual variable contents always the current effective value * reset_value the value to use for RESET - * session_value the "committed" setting for the session - * tentative_value the uncommitted result of SET ! During initialization we set the first three of these (actual, reset_value, ! and session_value) based on whichever non-interactive source has the ! highest priority. All three will have the same value. A SET LOCAL command sets the actual variable (and nothing else). At ! transaction end, the session_value is used to restore the actual variable ! to its pre-transaction value. A SET (or SET SESSION) command sets the actual variable, and if no error, then sets the tentative_value. If the transaction commits, the ! tentative_value is assigned to the session_value and the actual variable ! (which could by now be different, if the SET was followed by SET LOCAL). ! If the transaction aborts, the tentative_value is discarded and the ! actual variable is restored from the session_value. RESET is executed like a SET, but using the reset_value as the desired new value. (We do not provide a RESET LOCAL command, but SET LOCAL TO DEFAULT has the same behavior that RESET LOCAL would.) The source associated with ! the reset_value also becomes associated with the actual and session values. If SIGHUP is received, the GUC code rereads the postgresql.conf configuration file (this does not happen in the signal handler, but at next return to main loop; note that it can be executed while within a transaction). New values from postgresql.conf are assigned to actual ! variable, reset_value, and session_value, but only if each of these has a ! current source priority <= PGC_S_FILE. (It is thus possible for ! reset_value to track the config-file setting even if there is currently ! a different interactive value of the actual variable.) Note that tentative_value is unused and undefined except between a SET command and the end of the transaction. Also notice that we must track ! the source associated with each of the four values. The assign_hook and show_hook routines work only with the actual variable, and are not directly aware of the additional values maintained by GUC. --- 68,133 ---- would be effective had there never been any SET commands in the current session. ! To handle these cases we must keep track of many distinct values for each ! variable. The primary values are: * actual variable contents always the current effective value * reset_value the value to use for RESET * tentative_value the uncommitted result of SET ! The reason we need a tentative_value separate from the actual value is ! that when a transaction does SET followed by SET LOCAL, the actual value ! will now be the LOCAL value, but we want to remember the prior SET so that ! that value is restored at transaction commit. ! ! In addition, for each level of transaction (possibly nested) we have to ! remember the transaction-entry-time actual and tentative values, in case ! we need to restore them at transaction end. (The RESET value is essentially ! non-transactional, so it doesn't have to be stacked.) For efficiency these ! stack entries are not constructed until/unless the variable is actually SET ! within a particular transaction. ! ! During initialization we set the actual value and reset_value based on ! whichever non-interactive source has the highest priority. They will ! have the same value. The tentative_value is not meaningful at this point. ! ! A SET command starts by stacking the existing actual and tentative values ! if this hasn't already been done within the current transaction. Then: A SET LOCAL command sets the actual variable (and nothing else). At ! transaction end, the stacked values are used to restore the GUC entry ! to its pre-transaction state. A SET (or SET SESSION) command sets the actual variable, and if no error, then sets the tentative_value. If the transaction commits, the ! tentative_value is assigned again to the actual variable (which could by ! now be different, if the SET was followed by SET LOCAL). If the ! transaction aborts, the stacked values are used to restore the GUC entry ! to its pre-transaction state. ! ! In the case of SET within nested subtransactions, at each commit the ! tentative_value propagates out to the next transaction level. It will ! be thrown away at abort of any level, or after exiting the top transaction. RESET is executed like a SET, but using the reset_value as the desired new value. (We do not provide a RESET LOCAL command, but SET LOCAL TO DEFAULT has the same behavior that RESET LOCAL would.) The source associated with ! the reset_value also becomes associated with the actual and tentative values. If SIGHUP is received, the GUC code rereads the postgresql.conf configuration file (this does not happen in the signal handler, but at next return to main loop; note that it can be executed while within a transaction). New values from postgresql.conf are assigned to actual ! variable, reset_value, and stacked actual values, but only if each of ! these has a current source priority <= PGC_S_FILE. (It is thus possible ! for reset_value to track the config-file setting even if there is ! currently a different interactive value of the actual variable.) Note that tentative_value is unused and undefined except between a SET command and the end of the transaction. Also notice that we must track ! the source associated with each one of the values. The assign_hook and show_hook routines work only with the actual variable, and are not directly aware of the additional values maintained by GUC. *************** *** 129,137 **** context anyway, and strdup gives us more control over handling out-of-memory failures. ! We allow a variable's actual value, reset_val, session_val, and ! tentative_val to point at the same storage. This makes it slightly harder ! to free space (must test that the value to be freed isn't equal to any of ! the other three pointers). The main advantage is that we never need to ! strdup during transaction commit/abort, so cannot cause an out-of-memory ! failure there. --- 146,154 ---- context anyway, and strdup gives us more control over handling out-of-memory failures. ! We allow a string variable's actual value, reset_val, tentative_val, and ! stacked copies of same to point at the same storage. This makes it ! slightly harder to free space (must test whether a value to be freed isn't ! equal to any of the other pointers in the GUC entry or associated stack ! items). The main advantage is that we never need to strdup during ! transaction commit/abort, so cannot cause an out-of-memory failure there. *** src/backend/utils/misc/guc.c.orig Sat Jun 12 14:22:41 2004 --- src/backend/utils/misc/guc.c Tue Jun 29 14:55:50 2004 *************** *** 16,26 **** */ #include "postgres.h" ! #include <errno.h> #include <float.h> #include <limits.h> #include <unistd.h> - #include <ctype.h> #include "utils/guc.h" #include "utils/guc_tables.h" --- 16,25 ---- */ #include "postgres.h" ! #include <ctype.h> #include <float.h> #include <limits.h> #include <unistd.h> #include "utils/guc.h" #include "utils/guc_tables.h" *************** *** 54,59 **** --- 53,59 ---- #include "tcop/tcopprot.h" #include "utils/array.h" #include "utils/builtins.h" + #include "utils/memutils.h" #include "utils/pg_locale.h" #include "pgstat.h" *************** *** 105,110 **** --- 105,111 ---- GucSource source); static bool assign_stage_log_stats(bool newval, bool doit, GucSource source); static bool assign_log_stats(bool newval, bool doit, GucSource source); + static bool assign_transaction_read_only(bool newval, bool doit, GucSource source); /* *************** *** 174,218 **** static int block_size; static bool integer_datetimes; - /* Macros for freeing malloc'd pointers only if appropriate to do so */ - /* Some of these tests are probably redundant, but be safe ... */ - #define SET_STRING_VARIABLE(rec, newval) \ - do { \ - if (*(rec)->variable && \ - *(rec)->variable != (rec)->reset_val && \ - *(rec)->variable != (rec)->session_val && \ - *(rec)->variable != (rec)->tentative_val) \ - free(*(rec)->variable); \ - *(rec)->variable = (newval); \ - } while (0) - #define SET_STRING_RESET_VAL(rec, newval) \ - do { \ - if ((rec)->reset_val && \ - (rec)->reset_val != *(rec)->variable && \ - (rec)->reset_val != (rec)->session_val && \ - (rec)->reset_val != (rec)->tentative_val) \ - free((rec)->reset_val); \ - (rec)->reset_val = (newval); \ - } while (0) - #define SET_STRING_SESSION_VAL(rec, newval) \ - do { \ - if ((rec)->session_val && \ - (rec)->session_val != *(rec)->variable && \ - (rec)->session_val != (rec)->reset_val && \ - (rec)->session_val != (rec)->tentative_val) \ - free((rec)->session_val); \ - (rec)->session_val = (newval); \ - } while (0) - #define SET_STRING_TENTATIVE_VAL(rec, newval) \ - do { \ - if ((rec)->tentative_val && \ - (rec)->tentative_val != *(rec)->variable && \ - (rec)->tentative_val != (rec)->reset_val && \ - (rec)->tentative_val != (rec)->session_val) \ - free((rec)->tentative_val); \ - (rec)->tentative_val = (newval); \ - } while (0) - /* * Displayable names for context types (enum GucContext) --- 175,180 ---- *************** *** 801,807 **** GUC_NO_RESET_ALL | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE }, &XactReadOnly, ! false, NULL, NULL }, { {"add_missing_from", PGC_USERSET, COMPAT_OPTIONS_PREVIOUS, --- 763,769 ---- GUC_NO_RESET_ALL | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE }, &XactReadOnly, ! false, assign_transaction_read_only, NULL }, { {"add_missing_from", PGC_USERSET, COMPAT_OPTIONS_PREVIOUS, *************** *** 1766,1779 **** */ static struct config_generic **guc_variables; ! /* Current number of variables contained in the vector ! */ static int num_guc_variables; ! /* Vector capacity ! */ static int size_guc_variables; static bool guc_dirty; /* TRUE if need to do commit/abort work */ static bool reporting_enabled; /* TRUE to enable GUC_REPORT */ --- 1728,1740 ---- */ static struct config_generic **guc_variables; ! /* Current number of variables contained in the vector */ static int num_guc_variables; ! /* Vector capacity */ static int size_guc_variables; + static bool guc_dirty; /* TRUE if need to do commit/abort work */ static bool reporting_enabled; /* TRUE to enable GUC_REPORT */ *************** *** 1783,1796 **** static int guc_var_compare(const void *a, const void *b); static int guc_name_compare(const char *namea, const char *nameb); static void ReportGUCOption(struct config_generic * record); static char *_ShowOption(struct config_generic * record); ! struct config_generic** get_guc_variables() { return guc_variables; } /* * Build the sorted array. This is split out so that it could be * re-executed after startup (eg, we could allow loadable modules to --- 1744,1814 ---- static int guc_var_compare(const void *a, const void *b); static int guc_name_compare(const char *namea, const char *nameb); + static void push_old_value(struct config_generic *gconf); static void ReportGUCOption(struct config_generic * record); static char *_ShowOption(struct config_generic * record); ! ! /* ! * Support for assigning to a field of a string GUC item. Free the prior ! * value if it's not referenced anywhere else in the item (including stacked ! * states). ! */ ! static void ! set_string_field(struct config_string *conf, char **field, char *newval) ! { ! char *oldval = *field; ! GucStack *stack; ! ! /* Do the assignment */ ! *field = newval; ! ! /* Exit if any duplicate references, or if old value was NULL anyway */ ! if (oldval == NULL || ! oldval == *(conf->variable) || ! oldval == conf->reset_val || ! oldval == conf->tentative_val) ! return; ! for (stack = conf->gen.stack; stack; stack = stack->prev) ! { ! if (oldval == stack->tentative_val.stringval || ! oldval == stack->value.stringval) ! return; ! } ! ! /* Not used anymore, so free it */ ! free(oldval); ! } ! ! /* ! * Detect whether strval is referenced anywhere in a GUC string item ! */ ! static bool ! string_field_used(struct config_string *conf, char *strval) ! { ! GucStack *stack; ! ! if (strval == *(conf->variable) || ! strval == conf->reset_val || ! strval == conf->tentative_val) ! return true; ! for (stack = conf->gen.stack; stack; stack = stack->prev) ! { ! if (strval == stack->tentative_val.stringval || ! strval == stack->value.stringval) ! return true; ! } ! return false; ! } ! ! ! struct config_generic ** ! get_guc_variables(void) { return guc_variables; } + /* * Build the sorted array. This is split out so that it could be * re-executed after startup (eg, we could allow loadable modules to *************** *** 2001,2014 **** return find_option(map_old_guc_names[i+1]); } ! /* Check if the name is qualified, and if so, check if the qualifier * maps to a custom variable class. */ dot = strchr(name, GUC_QUALIFIER_SEPARATOR); if(dot != NULL && is_custom_class(name, dot - name)) ! /* ! * Add a placeholder variable for this name ! */ return (struct config_generic*)add_placeholder_variable(name); /* Unknown name */ --- 2019,2031 ---- return find_option(map_old_guc_names[i+1]); } ! /* ! * Check if the name is qualified, and if so, check if the qualifier * maps to a custom variable class. */ dot = strchr(name, GUC_QUALIFIER_SEPARATOR); if(dot != NULL && is_custom_class(name, dot - name)) ! /* Add a placeholder variable for this name */ return (struct config_generic*)add_placeholder_variable(name); /* Unknown name */ *************** *** 2081,2089 **** gconf->status = 0; gconf->reset_source = PGC_S_DEFAULT; - gconf->session_source = PGC_S_DEFAULT; gconf->tentative_source = PGC_S_DEFAULT; gconf->source = PGC_S_DEFAULT; switch (gconf->vartype) { --- 2098,2106 ---- gconf->status = 0; gconf->reset_source = PGC_S_DEFAULT; gconf->tentative_source = PGC_S_DEFAULT; gconf->source = PGC_S_DEFAULT; + gconf->stack = NULL; switch (gconf->vartype) { *************** *** 2097,2103 **** elog(FATAL, "failed to initialize %s to %d", conf->gen.name, (int) conf->reset_val); *conf->variable = conf->reset_val; - conf->session_val = conf->reset_val; break; } case PGC_INT: --- 2114,2119 ---- *************** *** 2119,2125 **** elog(FATAL, "failed to initialize %s to %d", conf->gen.name, conf->reset_val); *conf->variable = conf->reset_val; - conf->session_val = conf->reset_val; break; } case PGC_REAL: --- 2135,2140 ---- *************** *** 2135,2141 **** elog(FATAL, "failed to initialize %s to %g", conf->gen.name, conf->reset_val); *conf->variable = conf->reset_val; - conf->session_val = conf->reset_val; break; } case PGC_STRING: --- 2150,2155 ---- *************** *** 2150,2156 **** conf->assign_hook == assign_log_statement); *conf->variable = NULL; conf->reset_val = NULL; - conf->session_val = NULL; conf->tentative_val = NULL; if (conf->boot_val == NULL) --- 2164,2169 ---- *************** *** 2190,2196 **** } } *conf->variable = str; - conf->session_val = str; break; } } --- 2203,2208 ---- *************** *** 2254,2259 **** --- 2266,2274 ---- if (gconf->source <= PGC_S_OVERRIDE) continue; + /* Save old value to support transaction abort */ + push_old_value(gconf); + switch (gconf->vartype) { case PGC_BOOL: *************** *** 2336,2343 **** } } ! SET_STRING_VARIABLE(conf, str); ! SET_STRING_TENTATIVE_VAL(conf, str); conf->gen.source = conf->gen.reset_source; conf->gen.tentative_source = conf->gen.reset_source; conf->gen.status |= GUC_HAVE_TENTATIVE; --- 2351,2358 ---- } } ! set_string_field(conf, conf->variable, str); ! set_string_field(conf, &conf->tentative_val, str); conf->gen.source = conf->gen.reset_source; conf->gen.tentative_source = conf->gen.reset_source; conf->gen.status |= GUC_HAVE_TENTATIVE; *************** *** 2353,2363 **** /* ! * Do GUC processing at transaction commit or abort. */ void ! AtEOXact_GUC(bool isCommit) { int i; /* Quick exit if nothing's changed in this transaction */ --- 2368,2460 ---- /* ! * push_old_value ! * Push previous state during first assignment to a GUC variable ! * within a particular transaction. ! * ! * We have to be willing to "back-fill" the state stack if the first ! * assignment occurs within a subtransaction nested several levels deep. ! * This ensures that if an intermediate transaction aborts, it will have ! * the proper value available to restore the setting to. ! */ ! static void ! push_old_value(struct config_generic *gconf) ! { ! int my_level = GetCurrentTransactionNestLevel(); ! GucStack *stack; ! ! /* If we're not inside a transaction, do nothing */ ! if (my_level == 0) ! return; ! ! for (;;) ! { ! /* Done if we already pushed it at this nesting depth */ ! if (gconf->stack && gconf->stack->nest_level >= my_level) ! return; ! ! /* ! * We keep all the stack entries in TopTransactionContext so as to ! * avoid allocation problems when a subtransaction back-fills stack ! * entries for upper transaction levels. ! */ ! stack = (GucStack *) MemoryContextAlloc(TopTransactionContext, ! sizeof(GucStack)); ! ! stack->prev = gconf->stack; ! stack->nest_level = stack->prev ? stack->prev->nest_level + 1 : 1; ! stack->status = gconf->status; ! stack->tentative_source = gconf->tentative_source; ! stack->source = gconf->source; ! ! switch (gconf->vartype) ! { ! case PGC_BOOL: ! stack->tentative_val.boolval = ! ((struct config_bool *) gconf)->tentative_val; ! stack->value.boolval = ! *((struct config_bool *) gconf)->variable; ! break; ! ! case PGC_INT: ! stack->tentative_val.intval = ! ((struct config_int *) gconf)->tentative_val; ! stack->value.intval = ! *((struct config_int *) gconf)->variable; ! break; ! ! case PGC_REAL: ! stack->tentative_val.realval = ! ((struct config_real *) gconf)->tentative_val; ! stack->value.realval = ! *((struct config_real *) gconf)->variable; ! break; ! ! case PGC_STRING: ! stack->tentative_val.stringval = ! ((struct config_string *) gconf)->tentative_val; ! stack->value.stringval = ! *((struct config_string *) gconf)->variable; ! break; ! } ! ! gconf->stack = stack; ! ! /* Set state to indicate nothing happened yet within this level */ ! gconf->status = GUC_HAVE_STACK; ! ! /* Ensure we remember to pop at end of xact */ ! guc_dirty = true; ! } ! } ! ! /* ! * Do GUC processing at transaction or subtransaction commit or abort. */ void ! AtEOXact_GUC(bool isCommit, bool isSubXact) { + int my_level; int i; /* Quick exit if nothing's changed in this transaction */ *************** *** 2371,2385 **** guc_string_workspace = NULL; } for (i = 0; i < num_guc_variables; i++) { struct config_generic *gconf = guc_variables[i]; bool changed; ! /* Skip if nothing's happened to this var in this transaction */ ! if (gconf->status == 0) continue; changed = false; switch (gconf->vartype) --- 2468,2523 ---- guc_string_workspace = NULL; } + my_level = GetCurrentTransactionNestLevel(); + Assert(isSubXact ? (my_level > 1) : (my_level == 1)); + for (i = 0; i < num_guc_variables; i++) { struct config_generic *gconf = guc_variables[i]; + int my_status = gconf->status; + GucStack *stack = gconf->stack; + bool useTentative; bool changed; ! /* ! * Skip if nothing's happened to this var in this transaction ! */ ! if (my_status == 0) ! { ! Assert(stack == NULL); ! continue; ! } ! /* Assert that we stacked old value before changing it */ ! Assert(stack != NULL && (my_status & GUC_HAVE_STACK)); ! /* However, the last change may have been at an outer xact level */ ! if (stack->nest_level < my_level) continue; + Assert(stack->nest_level == my_level); + /* + * We will pop the stack entry. Start by restoring outer xact status + * (since we may want to modify it below). Be careful to use + * my_status to reference the inner xact status below this point... + */ + gconf->status = stack->status; + + /* + * We have two cases: + * + * If commit and HAVE_TENTATIVE, set actual value to tentative + * (this is to override a SET LOCAL if one occurred later than SET). + * We keep the tentative value and propagate HAVE_TENTATIVE to + * the parent status, allowing the SET's effect to percolate up. + * (But if we're exiting the outermost transaction, we'll drop the + * HAVE_TENTATIVE bit below.) + * + * Otherwise, we have a transaction that aborted or executed only + * SET LOCAL (or no SET at all). In either case it should have no + * further effect, so restore both tentative and actual values from + * the stack entry. + */ + + useTentative = isCommit && (my_status & GUC_HAVE_TENTATIVE) != 0; changed = false; switch (gconf->vartype) *************** *** 2387,2512 **** case PGC_BOOL: { struct config_bool *conf = (struct config_bool *) gconf; ! if (isCommit && (conf->gen.status & GUC_HAVE_TENTATIVE)) { ! conf->session_val = conf->tentative_val; ! conf->gen.session_source = conf->gen.tentative_source; } ! if (*conf->variable != conf->session_val) { if (conf->assign_hook) ! if (!(*conf->assign_hook) (conf->session_val, true, PGC_S_OVERRIDE)) elog(LOG, "failed to commit %s", conf->gen.name); ! *conf->variable = conf->session_val; changed = true; } ! conf->gen.source = conf->gen.session_source; ! conf->gen.status = 0; break; } case PGC_INT: { struct config_int *conf = (struct config_int *) gconf; ! if (isCommit && (conf->gen.status & GUC_HAVE_TENTATIVE)) { ! conf->session_val = conf->tentative_val; ! conf->gen.session_source = conf->gen.tentative_source; } ! if (*conf->variable != conf->session_val) { if (conf->assign_hook) ! if (!(*conf->assign_hook) (conf->session_val, true, PGC_S_OVERRIDE)) elog(LOG, "failed to commit %s", conf->gen.name); ! *conf->variable = conf->session_val; changed = true; } ! conf->gen.source = conf->gen.session_source; ! conf->gen.status = 0; break; } case PGC_REAL: { struct config_real *conf = (struct config_real *) gconf; ! if (isCommit && (conf->gen.status & GUC_HAVE_TENTATIVE)) { ! conf->session_val = conf->tentative_val; ! conf->gen.session_source = conf->gen.tentative_source; } ! if (*conf->variable != conf->session_val) { if (conf->assign_hook) ! if (!(*conf->assign_hook) (conf->session_val, true, PGC_S_OVERRIDE)) elog(LOG, "failed to commit %s", conf->gen.name); ! *conf->variable = conf->session_val; changed = true; } ! conf->gen.source = conf->gen.session_source; ! conf->gen.status = 0; break; } case PGC_STRING: { struct config_string *conf = (struct config_string *) gconf; ! if (isCommit && (conf->gen.status & GUC_HAVE_TENTATIVE)) { ! SET_STRING_SESSION_VAL(conf, conf->tentative_val); ! conf->gen.session_source = conf->gen.tentative_source; ! conf->tentative_val = NULL; /* transfer ownership */ } else - SET_STRING_TENTATIVE_VAL(conf, NULL); - - if (*conf->variable != conf->session_val) { ! char *str = conf->session_val; if (conf->assign_hook) { const char *newstr; ! newstr = (*conf->assign_hook) (str, true, PGC_S_OVERRIDE); if (newstr == NULL) elog(LOG, "failed to commit %s", conf->gen.name); ! else if (newstr != str) { /* * See notes in set_config_option about * casting */ ! str = (char *) newstr; ! SET_STRING_SESSION_VAL(conf, str); } } ! SET_STRING_VARIABLE(conf, str); changed = true; } ! conf->gen.source = conf->gen.session_source; ! conf->gen.status = 0; break; } } if (changed && (gconf->flags & GUC_REPORT)) ReportGUCOption(gconf); } ! guc_dirty = false; } --- 2525,2714 ---- case PGC_BOOL: { struct config_bool *conf = (struct config_bool *) gconf; + bool newval; + GucSource newsource; ! if (useTentative) { ! newval = conf->tentative_val; ! newsource = conf->gen.tentative_source; ! conf->gen.status |= GUC_HAVE_TENTATIVE; ! } ! else ! { ! newval = stack->value.boolval; ! newsource = stack->source; ! conf->tentative_val = stack->tentative_val.boolval; ! conf->gen.tentative_source = stack->tentative_source; } ! if (*conf->variable != newval) { if (conf->assign_hook) ! if (!(*conf->assign_hook) (newval, true, PGC_S_OVERRIDE)) elog(LOG, "failed to commit %s", conf->gen.name); ! *conf->variable = newval; changed = true; } ! conf->gen.source = newsource; break; } case PGC_INT: { struct config_int *conf = (struct config_int *) gconf; + int newval; + GucSource newsource; ! if (useTentative) { ! newval = conf->tentative_val; ! newsource = conf->gen.tentative_source; ! conf->gen.status |= GUC_HAVE_TENTATIVE; ! } ! else ! { ! newval = stack->value.intval; ! newsource = stack->source; ! conf->tentative_val = stack->tentative_val.intval; ! conf->gen.tentative_source = stack->tentative_source; } ! if (*conf->variable != newval) { if (conf->assign_hook) ! if (!(*conf->assign_hook) (newval, true, PGC_S_OVERRIDE)) elog(LOG, "failed to commit %s", conf->gen.name); ! *conf->variable = newval; changed = true; } ! conf->gen.source = newsource; break; } case PGC_REAL: { struct config_real *conf = (struct config_real *) gconf; + double newval; + GucSource newsource; ! if (useTentative) { ! newval = conf->tentative_val; ! newsource = conf->gen.tentative_source; ! conf->gen.status |= GUC_HAVE_TENTATIVE; ! } ! else ! { ! newval = stack->value.realval; ! newsource = stack->source; ! conf->tentative_val = stack->tentative_val.realval; ! conf->gen.tentative_source = stack->tentative_source; } ! if (*conf->variable != newval) { if (conf->assign_hook) ! if (!(*conf->assign_hook) (newval, true, PGC_S_OVERRIDE)) elog(LOG, "failed to commit %s", conf->gen.name); ! *conf->variable = newval; changed = true; } ! conf->gen.source = newsource; break; } case PGC_STRING: { struct config_string *conf = (struct config_string *) gconf; + char *newval; + GucSource newsource; ! if (useTentative) { ! newval = conf->tentative_val; ! newsource = conf->gen.tentative_source; ! conf->gen.status |= GUC_HAVE_TENTATIVE; } else { ! newval = stack->value.stringval; ! newsource = stack->source; ! set_string_field(conf, &conf->tentative_val, ! stack->tentative_val.stringval); ! conf->gen.tentative_source = stack->tentative_source; ! } + if (*conf->variable != newval) + { if (conf->assign_hook) { const char *newstr; ! newstr = (*conf->assign_hook) (newval, true, PGC_S_OVERRIDE); if (newstr == NULL) elog(LOG, "failed to commit %s", conf->gen.name); ! else if (newstr != newval) { /* + * If newval should now be freed, it'll be + * taken care of below. + * * See notes in set_config_option about * casting */ ! newval = (char *) newstr; } } ! set_string_field(conf, conf->variable, newval); changed = true; } ! conf->gen.source = newsource; ! /* Release stacked values if not used anymore */ ! set_string_field(conf, &stack->value.stringval, ! NULL); ! set_string_field(conf, &stack->tentative_val.stringval, ! NULL); ! /* Don't store tentative value separately after commit */ ! if (!isSubXact) ! set_string_field(conf, &conf->tentative_val, NULL); break; } } + /* Finish popping the state stack */ + gconf->stack = stack->prev; + pfree(stack); + + /* + * If we're now out of all xact levels, forget TENTATIVE status bit; + * there's nothing tentative about the value anymore. + */ + if (!isSubXact) + { + Assert(gconf->stack == NULL); + gconf->status = 0; + } + + /* Report new value if we changed it */ if (changed && (gconf->flags & GUC_REPORT)) ReportGUCOption(gconf); } ! /* ! * If we're now out of all xact levels, we can clear guc_dirty. ! * (Note: we cannot reset guc_dirty when exiting a subtransaction, ! * because we know that all outer transaction levels will have stacked ! * values to deal with.) ! */ ! if (!isSubXact) ! guc_dirty = false; } *************** *** 2810,2816 **** } /* ! * Should we set reset/session values? (If so, the behavior is not * transactional.) */ makeDefault = changeVal && (source <= PGC_S_OVERRIDE) && (value != NULL); --- 3012,3018 ---- } /* ! * Should we set reset/stacked values? (If so, the behavior is not * transactional.) */ makeDefault = changeVal && (source <= PGC_S_OVERRIDE) && (value != NULL); *************** *** 2820,2826 **** * However, if changeVal is false then plow ahead anyway since we are * trying to find out if the value is potentially good, not actually * use it. Also keep going if makeDefault is true, since we may want ! * to set the reset/session values even if we can't set the variable * itself. */ if (record->source > source) --- 3022,3028 ---- * However, if changeVal is false then plow ahead anyway since we are * trying to find out if the value is potentially good, not actually * use it. Also keep going if makeDefault is true, since we may want ! * to set the reset/stacked values even if we can't set the variable * itself. */ if (record->source > source) *************** *** 2901,2906 **** --- 3103,3111 ---- if (changeVal || makeDefault) { + /* Save old value to support transaction abort */ + if (!makeDefault) + push_old_value(&conf->gen); if (changeVal) { *conf->variable = newval; *************** *** 2908,2922 **** } if (makeDefault) { if (conf->gen.reset_source <= source) { conf->reset_val = newval; conf->gen.reset_source = source; } ! if (conf->gen.session_source <= source) { ! conf->session_val = newval; ! conf->gen.session_source = source; } } else if (isLocal) --- 3113,3132 ---- } if (makeDefault) { + GucStack *stack; + if (conf->gen.reset_source <= source) { conf->reset_val = newval; conf->gen.reset_source = source; } ! for (stack = conf->gen.stack; stack; stack = stack->prev) { ! if (stack->source <= source) ! { ! stack->value.boolval = newval; ! stack->source = source; ! } } } else if (isLocal) *************** *** 3006,3011 **** --- 3216,3224 ---- if (changeVal || makeDefault) { + /* Save old value to support transaction abort */ + if (!makeDefault) + push_old_value(&conf->gen); if (changeVal) { *conf->variable = newval; *************** *** 3013,3027 **** } if (makeDefault) { if (conf->gen.reset_source <= source) { conf->reset_val = newval; conf->gen.reset_source = source; } ! if (conf->gen.session_source <= source) { ! conf->session_val = newval; ! conf->gen.session_source = source; } } else if (isLocal) --- 3226,3245 ---- } if (makeDefault) { + GucStack *stack; + if (conf->gen.reset_source <= source) { conf->reset_val = newval; conf->gen.reset_source = source; } ! for (stack = conf->gen.stack; stack; stack = stack->prev) { ! if (stack->source <= source) ! { ! stack->value.intval = newval; ! stack->source = source; ! } } } else if (isLocal) *************** *** 3101,3106 **** --- 3319,3327 ---- if (changeVal || makeDefault) { + /* Save old value to support transaction abort */ + if (!makeDefault) + push_old_value(&conf->gen); if (changeVal) { *conf->variable = newval; *************** *** 3108,3122 **** } if (makeDefault) { if (conf->gen.reset_source <= source) { conf->reset_val = newval; conf->gen.reset_source = source; } ! if (conf->gen.session_source <= source) { ! conf->session_val = newval; ! conf->gen.session_source = source; } } else if (isLocal) --- 3329,3348 ---- } if (makeDefault) { + GucStack *stack; + if (conf->gen.reset_source <= source) { conf->reset_val = newval; conf->gen.reset_source = source; } ! for (stack = conf->gen.stack; stack; stack = stack->prev) { ! if (stack->source <= source) ! { ! stack->value.realval = newval; ! stack->source = source; ! } } } else if (isLocal) *************** *** 3261,3287 **** if (changeVal || makeDefault) { if (changeVal) { ! SET_STRING_VARIABLE(conf, newval); conf->gen.source = source; } if (makeDefault) { if (conf->gen.reset_source <= source) { ! SET_STRING_RESET_VAL(conf, newval); conf->gen.reset_source = source; } ! if (conf->gen.session_source <= source) { ! SET_STRING_SESSION_VAL(conf, newval); ! conf->gen.session_source = source; } /* Perhaps we didn't install newval anywhere */ ! if (newval != *conf->variable && ! newval != conf->session_val && ! newval != conf->reset_val) free(newval); } else if (isLocal) --- 3487,3520 ---- if (changeVal || makeDefault) { + /* Save old value to support transaction abort */ + if (!makeDefault) + push_old_value(&conf->gen); if (changeVal) { ! set_string_field(conf, conf->variable, newval); conf->gen.source = source; } if (makeDefault) { + GucStack *stack; + if (conf->gen.reset_source <= source) { ! set_string_field(conf, &conf->reset_val, newval); conf->gen.reset_source = source; } ! for (stack = conf->gen.stack; stack; stack = stack->prev) { ! if (stack->source <= source) ! { ! set_string_field(conf, &stack->value.stringval, ! newval); ! stack->source = source; ! } } /* Perhaps we didn't install newval anywhere */ ! if (!string_field_used(conf, newval)) free(newval); } else if (isLocal) *************** *** 3291,3297 **** } else { ! SET_STRING_TENTATIVE_VAL(conf, newval); conf->gen.tentative_source = source; conf->gen.status |= GUC_HAVE_TENTATIVE; guc_dirty = true; --- 3524,3530 ---- } else { ! set_string_field(conf, &conf->tentative_val, newval); conf->gen.tentative_source = source; conf->gen.status |= GUC_HAVE_TENTATIVE; guc_dirty = true; *************** *** 3608,3651 **** /* This better be a placeholder */ if(((*res)->flags & GUC_CUSTOM_PLACEHOLDER) == 0) - { ereport(ERROR, (errcode(ERRCODE_INTERNAL_ERROR), errmsg("attempt to redefine parameter \"%s\"", name))); ! } ! pHolder = (struct config_string*)*res; ! /* We have the same name, no sorting is necessary. ! */ *res = variable; value = *pHolder->variable; ! /* Assign the variable stored in the placeholder to the real ! * variable. */ set_config_option(name, value, pHolder->gen.context, pHolder->gen.source, false, true); ! /* Free up stuff occupied by the placeholder variable */ ! if(value != NULL) ! free((void*)value); ! ! if(pHolder->reset_val != NULL && pHolder->reset_val != value) ! free(pHolder->reset_val); ! ! if(pHolder->session_val != NULL ! && pHolder->session_val != value ! && pHolder->session_val != pHolder->reset_val) ! free(pHolder->session_val); ! ! if(pHolder->tentative_val != NULL ! && pHolder->tentative_val != value ! && pHolder->tentative_val != pHolder->reset_val ! && pHolder->tentative_val != pHolder->session_val) ! free(pHolder->tentative_val); free(pHolder); } --- 3841,3876 ---- /* This better be a placeholder */ if(((*res)->flags & GUC_CUSTOM_PLACEHOLDER) == 0) ereport(ERROR, (errcode(ERRCODE_INTERNAL_ERROR), errmsg("attempt to redefine parameter \"%s\"", name))); ! ! Assert((*res)->vartype == PGC_STRING); ! pHolder = (struct config_string*) *res; ! /* We have the same name, no sorting is necessary */ *res = variable; value = *pHolder->variable; ! /* ! * Assign the string value stored in the placeholder to the real variable. ! * ! * XXX this is not really good enough --- it should be a nontransactional ! * assignment, since we don't want it to roll back if the current xact ! * fails later. */ set_config_option(name, value, pHolder->gen.context, pHolder->gen.source, false, true); ! /* ! * Free up as much as we conveniently can of the placeholder structure ! * (this neglects any stack items...) */ ! set_string_field(pHolder, pHolder->variable, NULL); ! set_string_field(pHolder, &pHolder->reset_val, NULL); ! set_string_field(pHolder, &pHolder->tentative_val, NULL); free(pHolder); } *************** *** 3754,3760 **** define_custom_variable(&var->gen); } ! extern void EmittWarningsOnPlaceholders(const char* className) { struct config_generic** vars = guc_variables; struct config_generic** last = vars + num_guc_variables; --- 3979,3985 ---- define_custom_variable(&var->gen); } ! extern void EmitWarningsOnPlaceholders(const char* className) { struct config_generic** vars = guc_variables; struct config_generic** last = vars + num_guc_variables; *************** *** 5133,5137 **** --- 5358,5371 ---- return true; } + static bool + assign_transaction_read_only(bool newval, bool doit, GucSource source) + { + if (doit && source >= PGC_S_INTERACTIVE && IsSubTransaction()) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("cannot set transaction read only mode inside a subtransaction"))); + return true; + } #include "guc-file.c" *** src/include/utils/guc.h.orig Fri May 28 01:13:32 2004 --- src/include/utils/guc.h Tue Jun 29 13:56:33 2004 *************** *** 175,188 **** GucStringAssignHook assign_hook, GucShowHook show_hook); ! extern void EmittWarningsOnPlaceholders(const char* className); extern const char *GetConfigOption(const char *name); extern const char *GetConfigOptionResetString(const char *name); extern void ProcessConfigFile(GucContext context); extern void InitializeGUCOptions(void); extern void ResetAllOptions(void); ! extern void AtEOXact_GUC(bool isCommit); extern void BeginReportingGUCOptions(void); extern void ParseLongOption(const char *string, char **name, char **value); extern bool set_config_option(const char *name, const char *value, --- 175,188 ---- GucStringAssignHook assign_hook, GucShowHook show_hook); ! extern void EmitWarningsOnPlaceholders(const char* className); extern const char *GetConfigOption(const char *name); extern const char *GetConfigOptionResetString(const char *name); extern void ProcessConfigFile(GucContext context); extern void InitializeGUCOptions(void); extern void ResetAllOptions(void); ! extern void AtEOXact_GUC(bool isCommit, bool isSubXact); extern void BeginReportingGUCOptions(void); extern void ParseLongOption(const char *string, char **name, char **value); extern bool set_config_option(const char *name, const char *value, *** src/include/utils/guc_tables.h.orig Wed May 26 12:50:34 2004 --- src/include/utils/guc_tables.h Tue Jun 29 13:56:33 2004 *************** *** 11,18 **** * *------------------------------------------------------------------------- */ ! #ifndef GUC_TABLES ! #define GUC_TABLES 1 /* * Groupings to help organize all the run-time options for display --- 11,37 ---- * *------------------------------------------------------------------------- */ ! #ifndef GUC_TABLES_H ! #define GUC_TABLES_H 1 ! ! /* ! * GUC supports these types of variables: ! */ ! enum config_type ! { ! PGC_BOOL, ! PGC_INT, ! PGC_REAL, ! PGC_STRING ! }; ! ! union config_var_value ! { ! bool boolval; ! int intval; ! double realval; ! char *stringval; ! }; /* * Groupings to help organize all the run-time options for display *************** *** 56,70 **** }; /* ! * GUC supports these types of variables: */ ! enum config_type { ! PGC_BOOL, ! PGC_INT, ! PGC_REAL, ! PGC_STRING ! }; /* * Generic fields applicable to all types of variables --- 75,93 ---- }; /* ! * Stack entry for saving the state of a variable prior to the current ! * transaction */ ! typedef struct guc_stack { ! struct guc_stack *prev; /* previous stack item, if any */ ! int nest_level; /* nesting depth of cur transaction */ ! int status; /* previous status bits, see below */ ! GucSource tentative_source; /* source of the tentative_value */ ! GucSource source; /* source of the actual value */ ! union config_var_value tentative_val; /* previous tentative val */ ! union config_var_value value; /* previous actual value */ ! } GucStack; /* * Generic fields applicable to all types of variables *************** *** 86,94 **** enum config_type vartype; /* type of variable (set only at startup) */ int status; /* status bits, see below */ GucSource reset_source; /* source of the reset_value */ - GucSource session_source; /* source of the session_value */ GucSource tentative_source; /* source of the tentative_value */ GucSource source; /* source of the current actual value */ }; /* bit values in flags field */ --- 109,117 ---- enum config_type vartype; /* type of variable (set only at startup) */ int status; /* status bits, see below */ GucSource reset_source; /* source of the reset_value */ GucSource tentative_source; /* source of the tentative_value */ GucSource source; /* source of the current actual value */ + GucStack *stack; /* stacked outside-of-transaction states */ }; /* bit values in flags field */ *************** *** 104,109 **** --- 127,133 ---- /* bit values in status field */ #define GUC_HAVE_TENTATIVE 0x0001 /* tentative value is defined */ #define GUC_HAVE_LOCAL 0x0002 /* a SET LOCAL has been executed */ + #define GUC_HAVE_STACK 0x0004 /* we have stacked prior value(s) */ /* GUC records for specific variable types */ *************** *** 118,124 **** GucBoolAssignHook assign_hook; GucShowHook show_hook; /* variable fields, initialized at runtime: */ - bool session_val; bool tentative_val; }; --- 142,147 ---- *************** *** 134,140 **** GucIntAssignHook assign_hook; GucShowHook show_hook; /* variable fields, initialized at runtime: */ - int session_val; int tentative_val; }; --- 157,162 ---- *************** *** 150,156 **** GucRealAssignHook assign_hook; GucShowHook show_hook; /* variable fields, initialized at runtime: */ - double session_val; double tentative_val; }; --- 172,177 ---- *************** *** 165,171 **** GucShowHook show_hook; /* variable fields, initialized at runtime: */ char *reset_val; - char *session_val; char *tentative_val; }; --- 186,191 ---- *************** *** 180,183 **** extern void build_guc_variables(void); ! #endif --- 200,203 ---- extern void build_guc_variables(void); ! #endif /* GUC_TABLES_H */
On Sun, Jun 20, 2004 at 08:49:22PM -0400, Tom Lane wrote: > There's a good deal more than that missing :-(. Here are the modules or > actions that are called in CommitTransaction and/or AbortTransaction > that have not yet been touched by the patch: > > localbuf.c (refcounts need fixed same as bufmgr) Here is a patch against the original versions of these files; cleaned up bufmgr.c somewhat. Adds the same logic to local buffers (moving the BufferRefCount struct declaration to buf_internals.h so it's shared by both bufmgr.c and localbuf.c). Needs xact.c and xact.h patched as in the second patch. As with the bufmgr.c original patch, I don't really know how to test that this actually works. I fooled around with printing what it was doing during a subtrans commit/abort, and it seems OK, but that's about it. In what situations can a transaction roll back with a nonzero reference count in a local buffer? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "I dream about dreams about dreams", sang the nightingale under the pale moon (Sandman)
Attachment
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > As with the bufmgr.c original patch, I don't really know how to test > that this actually works. I fooled around with printing what it was > doing during a subtrans commit/abort, and it seems OK, but that's about > it. In what situations can a transaction roll back with a nonzero > reference count in a local buffer? You need an active cursor, eg begin; declare c cursor for select * from tenk1; fetch 1 in c; ... now you've got an open buffer refcount to some page of tenk1 I forgot to mention to you that that code didn't work at all, btw. I have fixed some of the problems in my local version but there's still a fairly large issue, which is what exactly we think the semantics of a cursor declared in a subtransaction ought to be. With bufmgr set up to consider open reference counts as a bug, we cannot hold such a cursor open past subtrans commit. One possible approach is to consider subxact commit the same as main xact commit as far as cursors are concerned: materialize anything declared WITH HOLD, close anything declared without. The other theory we could adopt is that cursors stay open till main xact commit; this would imply not releasing buffer refcounts at subxact commit, plus any other resources needed by the cursor. We're already holding locks that way and it probably wouldn't be a big change to make bufmgr work the same. I'm not sure that there are any other resources involved, other than the Portal memory which we already handle properly. The first approach is a lower-risk path; I'm not sure if the second one might have some hidden gotchas. It seems like the second one would be more flexible though. Any opinions which to pursue? Oh, there's another point: what happens if an outer xact level declares a cursor, which is then FETCHed from by a subtransaction? At minimum we have the problem that this could change the set of buffer pins held, which breaks the present bufmgr solution entirely. It gets even more interesting if you are of the opinion that subtransaction failure should cause the effects of the FETCH to be undone --- we have no way to do that at all, because there's no mechanism for saving/restoring the state of an entire execution plan tree. We might have to prohibit subtransactions from touching outer-level cursors, at least for 7.5. This would in turn make it a bit questionable whether there's any point in letting cursors propagate up out of subtransactions... regards, tom lane
BTW, here is what I'm working with just at the moment. regards, tom lane
Attachment
On Tue, Jun 29, 2004 at 06:59:20PM -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > As with the bufmgr.c original patch, I don't really know how to test > > that this actually works. [...] > > I forgot to mention to you that that code didn't work at all, btw. Bad news, I guess. > The other theory we could adopt is that cursors stay open till main xact > commit; this would imply not releasing buffer refcounts at subxact > commit, plus any other resources needed by the cursor. We're already > holding locks that way and it probably wouldn't be a big change to make > bufmgr work the same. I'm not sure that there are any other resources > involved, other than the Portal memory which we already handle properly. Well, AFAIR originally I had thought that refcounts should be held at subtrans commit; you suggested that there was no reason for a subtrans to keep a buffer refcount and that was it. I think the open cursor is a good reason why the count should be kept; it appears less useful if you can't use the cursor anywhere out of the level that created it. > Oh, there's another point: what happens if an outer xact level declares > a cursor, which is then FETCHed from by a subtransaction? At minimum we > have the problem that this could change the set of buffer pins held, > which breaks the present bufmgr solution entirely. It gets even more > interesting if you are of the opinion that subtransaction failure should > cause the effects of the FETCH to be undone --- we have no way to do > that at all, because there's no mechanism for saving/restoring the state > of an entire execution plan tree. Hmm ... yes, this could be very ugly indeed, but I haven't even looked at the executor code so I can't comment. Are executor nodes copyable? Oh, and I've been playing with large objects and I've encountered bugs elsewhere. I'll look at it with the new patch you just posted. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Vivir y dejar de vivir son soluciones imaginarias. La existencia está en otra parte" (Andre Breton)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Hmm ... yes, this could be very ugly indeed, but I haven't even looked > at the executor code so I can't comment. Are executor nodes copyable? Nope, and even if we had support for that the executor tree per se is just the tip of the iceberg. There's also indexscan status, SRF function internal state, yadda yadda. I think the odds of doing something with all that stuff for 7.5 are exactly zero ... we'd better define a stopgap behavior. > Oh, and I've been playing with large objects and I've encountered bugs > elsewhere. I'll look at it with the new patch you just posted. Wouldn't surprise me, we've not looked at that yet either. I do feel that we have enough things working that we should commit to nested transactions for 7.5. There will be some things that we have to restrict, such as cursors and perhaps large objects. But it's surely better than no subtransactions at all. regards, tom lane
Added to TODO, just so we don't forget later: * Use a phantom command counter for nested subtransactions to reduce tuple overhead --------------------------------------------------------------------------- Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > Hmm ... yes, this could be very ugly indeed, but I haven't even looked > > at the executor code so I can't comment. Are executor nodes copyable? > > Nope, and even if we had support for that the executor tree per se > is just the tip of the iceberg. There's also indexscan status, SRF > function internal state, yadda yadda. I think the odds of doing > something with all that stuff for 7.5 are exactly zero ... we'd better > define a stopgap behavior. > > > Oh, and I've been playing with large objects and I've encountered bugs > > elsewhere. I'll look at it with the new patch you just posted. > > Wouldn't surprise me, we've not looked at that yet either. > > I do feel that we have enough things working that we should commit to > nested transactions for 7.5. There will be some things that we have to > restrict, such as cursors and perhaps large objects. But it's surely > better than no subtransactions at all. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073