Thread: [WIP PATCH] Lazily assign xids for toplevel Transactions
I'm resending this without that actual patch attached, since my first mail didn't get through. The patch can be found here: http://soc.phlo.org/lazyxidassign.patch Hi Lately, there was some interest in assigning XIDs for toplevel Transactions only when the transaction actually needs one (that is, it writes something to the disk), like we currently do for subtransactions. This is obviously also a crucial part of my work on read-only transactions on PITR slaves - since there is no way to assign "real" XIDs on a PITR slave. I've spent the last few days factoring out that work, and turning it into a general solution. The result is this patch, which basically does the following .) It defines a special TemporaryTransactionId that is used as an xact's xid until the xact calls GetCurrentTransactionId/ GetTopTransactionId. .) It introduces a new macro TransactionIdIsPermanent, which tells if an XID is valid, and not equal to TemporaryTransactionId. .) It lets GetTopTransactionId assign an XID on-demand, similar to how GetCurrentTransactionId handles that for subtransactions. .) Each transaction get an "rid" (ResourceOwnerId) assigned when it starts, and obtains a lock on that rid, similar tohow the xid is locked. This can be used to wait for a transaction's toplevel resource owner to release all it's locks,and serves as a unique identifier for a running transaction. This is needed for concurrent index builds to waituntil all transactions holding a ShareLock on the target relation have ended. The patch passes the regression test, though there are currently two issues that need to be resolved. 1) The second waiting phase of concurrent index builds fail to wait for xacts that haven't been assigned an XID when thereference shapshot was taken. The "rid" doesn't help here, because it's not currently store in the snapshot. 2) I'm not entirely sure yet how to handle two flags MyXactMadeTempRelUpdates, MyXactMadeXLogEntry and the MyRecPtr variable.Those seems to be made partly redundant by this patch - checking if an xact has a permanent xid assigned alreadytells if the transaction made any writes. (1) could be easiy solves by querying the list of currently active RIDs after taking the reference snapshot. But since AFAIK HOT introduces a new method for guaranteeing that a transaction won't use an index that might not contain all tuples that xact is interested in, I wanted to get feedback on how HOT currently handles this. It's probably not the best time to come up with new patches, since everybody seems to be busy working on getting 8.3 out. But this patch is a quite natural fallout of my work on read-only queries on PITR slaves, and I'd be very interested to know if the general direction this patch takes is deemed acceptable. greetings, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes: > I've spent the last few days factoring out that work, and turning it into > a general solution. The result is this patch, which basically does the following > .) It defines a special TemporaryTransactionId that is used as an xact's xid > until the xact calls GetCurrentTransactionId / GetTopTransactionId. [ squint... ] Why do you need that? The behavior we use with subtransactions is just to leave the XID as InvalidOid until there's a reason to assign it, and I think we should do the same with top-level XIDs. Having an extra TemporaryTransactionId seems ugly, mainly because it's not clear how XID comparison should handle it. To leave XID at 0, you will need to teach GetSnapshotData and maybe some other places that a proc could be advertising nonzero xmin even when its XID is still 0. This does not seem like a big problem though. > .) Each transaction get an "rid" (ResourceOwnerId) assigned when it starts, and > obtains a lock on that rid, similar to how the xid is locked. This can > be used to wait for a transaction's toplevel resource owner to release all > it's locks, and serves as a unique identifier for a running transaction. This seems like inventing a concept we could do without, also overhead we could do without (assigning globally unique RIDs would require as much mechanism and contention as XID assignment does). What about locking the backend's PID instead? I do not see that long-term uniqueness is needed, we just want to be able to wait for the backend's current transaction to end. If you do think it's important to distinguish the other guy's current and next transaction (which maybe it is), then possibly we could lock a combination of the PID and a *local*, per-backend transaction counter (there should be plenty of room in LOCKTAG for this). This counter value would have to be advertised in PGPROC, but there wouldn't be any contention involved to assign a new value. It's slightly annoying that this scheme involves taking two lmgr locks per read-write transaction. I wonder whether we couldn't dispense with the notion of locking one's XID per se. This would mean that where we try to wait for another transaction by XID, we have to trawl through the ProcArray to find that XID and see what PID/localID it maps to; but if we're in that path we're already going to be blocking, so more cycles there might be a good tradeoff for fewer cycles in transaction start. > 1) The second waiting phase of concurrent index builds fail to wait for xacts > that haven't been assigned an XID when the reference shapshot was taken. > The "rid" doesn't help here, because it's not currently store in the > snapshot. We'd probably want to whack that around so that it pays attention to the xmins advertised by other backends, rather than their XIDs. This is just a handwave though. > 2) I'm not entirely sure yet how to handle two flags MyXactMadeTempRelUpdates, > MyXactMadeXLogEntry and the MyRecPtr variable. Those seems to be made > partly redundant by this patch - checking if an xact has a > permanent xid assigned already tells if the transaction made any writes. Yeah, it should be impossible for those to be set in a transaction that hasn't assigned itself an xid. regards, tom lane
Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> I've spent the last few days factoring out that work, and turning it into >> a general solution. The result is this patch, which basically does the following > >> .) It defines a special TemporaryTransactionId that is used as an xact's xid >> until the xact calls GetCurrentTransactionId / GetTopTransactionId. > > [ squint... ] Why do you need that? The behavior we use with > subtransactions is just to leave the XID as InvalidOid until there's > a reason to assign it, and I think we should do the same with top-level > XIDs. Having an extra TemporaryTransactionId seems ugly, mainly because > it's not clear how XID comparison should handle it. I invented the TemporaryTransactionId because it seemed worthwhile to distinguish between backends which do not currently run a transaction (xid == InvalidTransactionId), and such which run a transaction that doesn't yet have an xid assiged (xid == TemporaryTransactionId). Currently, the TemporaryTransactionId is treated to be later than any other xid value. I'm not wedded to those TemporaryTransactionIds though - they just seemed like a good idea when I started with my readonly-queries on PITR-slaves work, and it allows for a few more assertions. > To leave XID at 0, you will need to teach GetSnapshotData and maybe > some other places that a proc could be advertising nonzero xmin even > when its XID is still 0. This does not seem like a big problem though. Yeah - TemporaryTransactionId removed the need for a few special cases in that area - but at the cost of having the distinguish between TransactionIdIsValid and TransactionIdIsPermanent (meaning valid && !temporary). >> .) Each transaction get an "rid" (ResourceOwnerId) assigned when it starts, and >> obtains a lock on that rid, similar to how the xid is locked. This can >> be used to wait for a transaction's toplevel resource owner to release all >> it's locks, and serves as a unique identifier for a running transaction. > > This seems like inventing a concept we could do without, also overhead > we could do without (assigning globally unique RIDs would require as > much mechanism and contention as XID assignment does). What about > locking the backend's PID instead? I do not see that long-term > uniqueness is needed, we just want to be able to wait for the backend's > current transaction to end. > > If you do think it's important to distinguish the other guy's current > and next transaction (which maybe it is), then possibly we could lock a > combination of the PID and a *local*, per-backend transaction counter > (there should be plenty of room in LOCKTAG for this). This counter > value would have to be advertised in PGPROC, but there wouldn't be any > contention involved to assign a new value. I wanted some (single) value that would fit into some standard C datatype. Since I guess using "long long" in postgres code code is a bad idea (It's not supported on all 32-bit plattforms I think), I wanted to come up with some 32-bit identifier. If the PID were guaranteed to be 16-bit we could use the other 16 bits as a counter - but all modern Unixen have moved past a limit of 65535 processes I fear... While writing this mail I realized that my RID generation algorithm - while being quite lightweight I think - has a small race condition. The algorithm is for(;;) { rid = ShmemVariableCache->nextRid++ ; if (ResourceOwnerIdIsValid(rid) && ResourceOwnerLockTableInsert(rid)) break ; } I just realized that if two backend manage to obtain the same rid, and one than is paged out long enough for the other to lock the rid, run it's transaction and commit, then the second backend will get the same rid :-(. So it's back to the drawing board anyway... > It's slightly annoying that this scheme involves taking two lmgr locks > per read-write transaction. I wonder whether we couldn't dispense with > the notion of locking one's XID per se. This would mean that where we > try to wait for another transaction by XID, we have to trawl through > the ProcArray to find that XID and see what PID/localID it maps to; > but if we're in that path we're already going to be blocking, so more > cycles there might be a good tradeoff for fewer cycles in transaction > start. Yeah - I do not really like that dual-locking thing either. But it makes prepared transaction handling much easier - if we were to only lock the RID, we'd have to store the rid<->xid mapping for prepared transactions somewhere *and* guarantee that we won't assign that RID to another transaction - even after a server restart... >> 1) The second waiting phase of concurrent index builds fail to wait for xacts >> that haven't been assigned an XID when the reference shapshot was taken. >> The "rid" doesn't help here, because it's not currently store in the >> snapshot. > > We'd probably want to whack that around so that it pays attention to the > xmins advertised by other backends, rather than their XIDs. This is > just a handwave though. It's even simpler - In that section of the code, the XID is used just as a unique identifier for currently running transactions - and talking a snapshot convenientl creates just such a list of XIDs. The XID isn't used in any comparisions. So I think I'd be sufficient to scan the proc array after obtaining the reference snapshot, and just manually build a list of RIDs instead of XIDs. It wouldn't matter that we build the list slightly later than the snapshot - the only effect of this would be that we might end up waiting for a transaction unnecessarily. However, before putting more work into that part of the code, I'll check the latest version of HOT. I seem to remember that there was this idea of storing some kind of xmin with an index that could be used to decide if an index was usable for a certain snapshot. greetings, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes: > Tom Lane wrote: >> [ squint... ] Why do you need that? > I invented the TemporaryTransactionId because it seemed worthwhile to > distinguish between backends which do not currently run a transaction > (xid == InvalidTransactionId), and such which run a transaction that > doesn't yet have an xid assiged (xid == TemporaryTransactionId). Not really. I argue that the important distinction is between backends that have active snapshots, and those that do not. That is signaled by the value of MyProc->xmin. The startup state where the backend is starting a transaction but has not yet computed its serializable snapshot is not really particularly interesting --- indeed, where along there would you say that it has "started" its transaction, and why? The distinction you are making here is quite arbitrary. I also really dislike the idea of a backend changing its XID midstream; that essentially kills any ability of other xacts to recognize whether the backend is in the "same" transaction as it was the last time they looked. Maybe that isn't critical, but it seems like a bad ability to give up. >> If you do think it's important to distinguish the other guy's current >> and next transaction (which maybe it is), then possibly we could lock a >> combination of the PID and a *local*, per-backend transaction counter >> (there should be plenty of room in LOCKTAG for this). This counter >> value would have to be advertised in PGPROC, but there wouldn't be any >> contention involved to assign a new value. > I wanted some (single) value that would fit into some standard C datatype. Why? The only place where we need this (AFAICS) is in LOCKTAGs, and those are structs already. > Yeah - I do not really like that dual-locking thing either. But it makes > prepared transaction handling much easier - if we were to only lock the > RID, we'd have to store the rid<->xid mapping for prepared transactions Hmmm .... that's a good point. Not sure how to include prepared xacts in the scheme. regards, tom lane
I wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> Yeah - I do not really like that dual-locking thing either. But it makes >> prepared transaction handling much easier - if we were to only lock the >> RID, we'd have to store the rid<->xid mapping for prepared transactions > Hmmm .... that's a good point. Not sure how to include prepared xacts > in the scheme. After further thought I think we should avoid depending on PIDs as part of a lock tag --- their behavior is too system-dependent, in particular we have no right to assume that when a backend exits the same PID won't be reassigned shortly after, possibly leading to confusion. Instead, I suggest that we keep a session counter in shared memory and have each backend assign itself a session ID at startup using that. A 32-bit session ID in combination with a 32-bit locally assigned transaction number should be sufficiently unique to identify a transaction (prepared or otherwise) for the purposes of locking. These "transient XIDs" only need to be unique as long as the transaction exists plus shortly thereafter (to avoid race conditions when someone waits for a transaction that actually terminated a moment before). So wraparound of the counters isn't a problem, although we probably want to reserve zero as an invalid value. I think we only need a transient XID of this sort for top-level transactions, not subtransactions. I thought for awhile about how to avoid taking a lock on the regular XID as well as the transient XID, but it seems pretty messy, particularly if we don't want to assign transient XIDs to subtransactions. Probably best not to go there, at least not in the first-cut patch. To make CREATE INDEX CONCURRENTLY work, we'd need two things: * GetLockConflicts would need to report the transient XIDs of the conflicting xacts, not regular XIDs, since they might nothave regular XIDs. Then we'd wait on those locks instead of regular-XID locks. * The second phase where we wait out transactions that can still see old tuples doesn't work because such transactions won'tnecessarily be listed in the snapshot. Instead, what we have to do is look through the ProcArray for transactions whoseadvertised xmin is less than the xmax of our reference snapshot. When we find one, wait for it using its transientXID. AFAICT, C.I.C. is currently the only place in the system where we really need transient XIDs at all. Everyplace else that we need to wait for a transaction, it's because we found its regular XID in a tuple we want to lock or modify. So the whole thing is a bit annoying. Maybe we could get rid of the extra overhead with some shenanigans inside the lock manager, like not bothering to create a data structure representing the holding of a transient-XID lock until such time as C.I.C. actually tries to wait for it. But again, that seems like a second-pass optimization. Can anyone suggest better terminology than "transient XID"? regards, tom lane
Tom Lane wrote: > I wrote: >> "Florian G. Pflug" <fgp@phlo.org> writes: >>> Yeah - I do not really like that dual-locking thing either. But it makes >>> prepared transaction handling much easier - if we were to only lock the >>> RID, we'd have to store the rid<->xid mapping for prepared transactions > >> Hmmm .... that's a good point. Not sure how to include prepared xacts >> in the scheme. > > After further thought I think we should avoid depending on PIDs as part > of a lock tag --- their behavior is too system-dependent, in particular > we have no right to assume that when a backend exits the same PID won't > be reassigned shortly after, possibly leading to confusion. > > Instead, I suggest that we keep a session counter in shared memory and > have each backend assign itself a session ID at startup using that. > A 32-bit session ID in combination with a 32-bit locally assigned > transaction number should be sufficiently unique to identify a > transaction (prepared or otherwise) for the purposes of locking. > These "transient XIDs" only need to be unique as long as the transaction > exists plus shortly thereafter (to avoid race conditions when someone > waits for a transaction that actually terminated a moment before). > So wraparound of the counters isn't a problem, although we probably > want to reserve zero as an invalid value. > > I think we only need a transient XID of this sort for top-level > transactions, not subtransactions. Sounds good, if we decide to go with the transient XID idea. So below for an alternative that I just came up with. > To make CREATE INDEX CONCURRENTLY work, we'd need two things: > > * GetLockConflicts would need to report the transient XIDs of the > conflicting xacts, not regular XIDs, since they might not have regular > XIDs. Then we'd wait on those locks instead of regular-XID locks. Yes. This is exactly what my patch does today. > * The second phase where we wait out transactions that can still see > old tuples doesn't work because such transactions won't necessarily > be listed in the snapshot. Instead, what we have to do is look > through the ProcArray for transactions whose advertised xmin is > less than the xmax of our reference snapshot. When we find one, > wait for it using its transient XID. > AFAICT, C.I.C. is currently the only place in the system where we > really need transient XIDs at all. Everyplace else that we need to > wait for a transaction, it's because we found its regular XID in > a tuple we want to lock or modify. So the whole thing is a bit > annoying. Maybe we could get rid of the extra overhead with some > shenanigans inside the lock manager, like not bothering to create > a data structure representing the holding of a transient-XID lock > until such time as C.I.C. actually tries to wait for it. But > again, that seems like a second-pass optimization. I've given some thought to that. There are two distinct things we need to be able to wait for 1) Until all current holders of a lock, grantmask conflicts with a given locklevel have dropped their lock 2) Until all currently in-use snapshots have an xmin larger than some given value (The xmax of the reference snapshot). (1) Could be solved directly in the lock manager. We'd need some mechanism to wake up a process whenever someone releases a ceratin lock. (2) Could be done by acquireing a ShareLock (with a new locktype LOCKTYPE_XMIN) on the xmin of a transaction's serializable snapshot when it's created. The second waiting phase of concurrent index builds would then be a) Findthe oldest xmin in the ProcArray. b) If that xmin is equal or greater than the xmax of our reference snapshot,we're done. c) Wait until the ExclusiveLock (for LOCKTYPE_TRANSACTION) is released on that xmin. Afterthat point, new transactions will compute an xmin greater than the oldest one we found in the ProcArray,because the limiting transactions has exited, and because ReadNewTransactionId returns a value greater than that xmin too (Otherwise, we'd have exited in (b)). d) Wait for all current holders of LOCKTYPE_XMIN torelease their locks. (Using the machinery needed for (1)). No new holders can show up, because new snapshotswill computer a larger xmin. e) Goto a). I could code (2), but I'd need help with (1) - The details of the locking subsystems are still somewhat a mystery to me. greetings, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes: > Sounds good, if we decide to go with the transient XID idea. So below > for an alternative that I just came up with. This proposal appears to require taking and releasing a brand-new lock type every time a snapshot is made or destroyed. That is certainly not going to be less overhead than the transient-XID scheme. At least in READ COMMITTED mode, there are normally multiple snapshots taken per transaction. (Something worth noting here is that I expect soon, probably 8.4, we will fix things so that what a backend advertises in MyProc->xmin is the xmin of its oldest still-live snapshot. That means that xmin will change intra-transaction in READ COMMITTED mode, and thus that we would indeed need to take and release the sort of lock you are suggesting each time.) regards, tom lane
Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> Sounds good, if we decide to go with the transient XID idea. So below >> for an alternative that I just came up with. > > This proposal appears to require taking and releasing a brand-new lock > type every time a snapshot is made or destroyed. That is certainly not > going to be less overhead than the transient-XID scheme. At least in > READ COMMITTED mode, there are normally multiple snapshots taken per > transaction. Only for the serializable shapsnot I'd have thought, making the overhead bearable (it's is surely the oldest of all the xact's shapshots) but ... > (Something worth noting here is that I expect soon, probably 8.4, > we will fix things so that what a backend advertises in MyProc->xmin > is the xmin of its oldest still-live snapshot. That means that xmin > will change intra-transaction in READ COMMITTED mode, and thus that > we would indeed need to take and release the sort of lock you are > suggesting each time.) ... with this in mind, my proposal looks pretty bad :-( What do you think about solving the requirements of the *first* waiting phase (Where we wait for current ShareLock holders) inside the lock manager? The only real downside I can see is that I feel uneasy about messing with that code... It seems to be subtle, and quick to anger ;-) For the second phase, I see two options .) Go forward with the transient XIDs / RIDs approach .) Do something similar tothe indcreatexid idea the HOT patch implements. This essentially puts the burden of deciding an index's usabilityon the using xact, not on the one creating the index. greetings, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes: > What do you think about solving the requirements of the *first* waiting > phase (Where we wait for current ShareLock holders) inside the lock manager? > The only real downside I can see is that I feel uneasy about messing with > that code... It seems to be subtle, and quick to anger ;-) It sounded pretty dubious to me. The problem is that we don't want to wait until we could actually *get* the lock, we only want to wait out the conflicting xacts that existed when we looked. This is important because there might be a steady stream of new xacts acquiring conflicting locks (ie, a steady stream of writers), and we don't want to either block them, or have to hope for a window where there are none. But the lock manager does not currently track who acquired a lock when, and I think it would add a lot of usually-wasted overhead to do that. I spent a fair amount of time yesterday trying to think of alternative solutions, and didn't really think of much. The core reason why C.I.C. is implemented the way it is is that it's entirely possible that there will be a deadlock with some other process (ie, the other process is old enough that we must wait for it, but it's blocked trying to acquire some lock that conflicts with our ShareUpdateExclusiveLock). Thus, waiting by trying to acquire XID locks is a good idea because it automatically detects deadlock, and may even be able to escape the deadlock by wait-queue rearrangement. (I'm not certain that the latter is applicable in any situation C.I.C. would get into, but I'm not certain it's not, either.) Schemes involving "sleep awhile and check the ProcArray again" are right out because they fail to detect deadlock. Everything else I could think of involved special new lockmanager features that would have to still preserve the ability to handle deadlocks, which didn't sound like something I want to tackle for this. So on the whole the extra transaction identifier seems to be the way to go. I haven't looked at how that interacts with HOT though. regards, tom lane
Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> What do you think about solving the requirements of the *first* waiting >> phase (Where we wait for current ShareLock holders) inside the lock manager? >> The only real downside I can see is that I feel uneasy about messing with >> that code... It seems to be subtle, and quick to anger ;-) > > It sounded pretty dubious to me. The problem is that we don't want to > wait until we could actually *get* the lock, we only want to wait out > the conflicting xacts that existed when we looked. This is important > because there might be a steady stream of new xacts acquiring > conflicting locks (ie, a steady stream of writers), and we don't want to > either block them, or have to hope for a window where there are none. > But the lock manager does not currently track who acquired a lock when, > and I think it would add a lot of usually-wasted overhead to do that. > > I spent a fair amount of time yesterday trying to think of alternative > solutions, and didn't really think of much. The core reason why C.I.C. > is implemented the way it is is that it's entirely possible that there > will be a deadlock with some other process (ie, the other process is > old enough that we must wait for it, but it's blocked trying to acquire > some lock that conflicts with our ShareUpdateExclusiveLock). Thus, > waiting by trying to acquire XID locks is a good idea because it > automatically detects deadlock, and may even be able to escape the > deadlock by wait-queue rearrangement. (I'm not certain that the latter > is applicable in any situation C.I.C. would get into, but I'm not > certain it's not, either.) Schemes involving "sleep awhile and check > the ProcArray again" are right out because they fail to detect > deadlock. Everything else I could think of involved special new > lockmanager features that would have to still preserve the ability > to handle deadlocks, which didn't sound like something I want to > tackle for this. > > So on the whole the extra transaction identifier seems to be the > way to go. I haven't looked at how that interacts with HOT though. Ok, I'll update the patch to use your global id plus local id idea than, and remove TemporaryTransactionIds. greetings, Florian Pflug