Thread: Strangeness in xid allocation / snapshot setup
Hi Vadim, I am trying to understand why GetSnapshotData() needs to acquire the SInval spinlock before it calls ReadNewTransactionId, rather than after. I see that you made it do so --- in the commit at http://www.ca.postgresql.org/cgi/cvsweb.cgi/pgsql/src/backend/storage/ipc/shmem.c.diff?r1=1.41&r2=1.42 but I don't understand why the loss of concurrency is "necessary". Since we are going to treat all xids >= xmax as in-progress anyway, what's wrong with reading xmax before we acquire the SInval lock? Also, it seems to me that in GetNewTransactionId(), it's important for MyProc->xid to be set before releasing XidGenLockId, not after. Otherwise there is a narrow window for failure: 1. Process A calls GetNewTransactionId. It allocates an xid of, say, 1234, and increments nextXid to 1235. Just after releasing the XidGenLock spinlock, but before it can set its MyProc->xid, control swaps away from it. 2. Process B gets to run. It runs GetSnapshotData. It sees nextXid = 1235, and it does not see xid = 1234 in any backend's proc->xid. Therefore, B will assume xid 1234 has already terminated, when it hasn't. Isn't this broken? The problem would be avoided if GetNewTransactionId sets MyProc->xid before releasing the spinlock, since then after GetSnapshotData has called ReadNewTransactionId, we know that all older XIDs that are still active are recorded in proc structures. Comments? regards, tom lane
> I am trying to understand why GetSnapshotData() needs to acquire the > SInval spinlock before it calls ReadNewTransactionId, rather than after. > I see that you made it do so --- in the commit at > http://www.ca.postgresql.org/cgi/cvsweb.cgi/pgsql/src/backend/storage/ipc/sh mem.c.diff?r1=1.41&r2=1.42 > but I don't understand why the loss of concurrency is "necessary". > Since we are going to treat all xids >= xmax as in-progress anyway, > what's wrong with reading xmax before we acquire the SInval lock? AFAIR, I made so to prevent following: 1. Tx Old is running. 2. Tx S reads new transaction ID in GetSnapshotData() and swapped away before SInval acquired. 3. Tx New gets new transaction ID, makes changes and commits. 4. Tx Old changes some row R changed by Tx New and commits. 5. Tx S gets snapshot data and now sees R changed by *both* Tx Old and Tx New *but* does not see *other* changes made byTx New => Tx S reads unconsistent data. --------- As for issue below - I don't remember why I decided that it's not important and will need in some time to remember. > Also, it seems to me that in GetNewTransactionId(), it's important > for MyProc->xid to be set before releasing XidGenLockId, not after. > Otherwise there is a narrow window for failure: > > 1. Process A calls GetNewTransactionId. It allocates an xid of, say, > 1234, and increments nextXid to 1235. Just after releasing the > XidGenLock spinlock, but before it can set its MyProc->xid, control > swaps away from it. > > 2. Process B gets to run. It runs GetSnapshotData. It sees nextXid = > 1235, and it does not see xid = 1234 in any backend's proc->xid. > Therefore, B will assume xid 1234 has already terminated, when it > hasn't. > > Isn't this broken? The problem would be avoided if > GetNewTransactionId > sets MyProc->xid before releasing the spinlock, since then after > GetSnapshotData has called ReadNewTransactionId, we know that > all older > XIDs that are still active are recorded in proc structures. > > Comments? > > regards, tom lane >
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes: >> Since we are going to treat all xids >= xmax as in-progress anyway, >> what's wrong with reading xmax before we acquire the SInval lock? > AFAIR, I made so to prevent following: > 1. Tx Old is running. > 2. Tx S reads new transaction ID in GetSnapshotData() and swapped away > before SInval acquired. > 3. Tx New gets new transaction ID, makes changes and commits. > 4. Tx Old changes some row R changed by Tx New and commits. > 5. Tx S gets snapshot data and now sees R changed by *both* Tx Old and > Tx New *but* does not see *other* changes made by Tx New => > Tx S reads unconsistent data. Hmm, but that doesn't seem to have anything to do with the way that GetSnapshotData operates. If Tx New has an XID >= xmax read by Tx S' GetSnapshotData, then Tx New will be considered uncommitted by S no matter which order we get the locks in; it hardly matters whether Tx New manages to physically commit before we finish building the snapshot for S. On the other side of the coin, if Tx New's XID < xmax for S, then *with the GetNewTransactionId change that I want* we can be sure that Tx New will be seen running by S when it does get the SInval lock (unless New has managed to finish before S gets the lock, in which case it's perfectly reasonable for S to treat it as committed or aborted). Anyway, it seems to me that the possibility of inconsistent data is inherent in the way we handle updated rows in Read Committed mode --- you can always get to see a row that was emitted by a transaction you don't see the other effects of. regards, tom lane
> > 1. Tx Old is running. > > 2. Tx S reads new transaction ID in GetSnapshotData() and swapped away > > before SInval acquired. > > 3. Tx New gets new transaction ID, makes changes and commits. > > 4. Tx Old changes some row R changed by Tx New and commits. > > 5. Tx S gets snapshot data and now sees R changed by *both* Tx Old and > > Tx New *but* does not see *other* changes made by Tx New => > > Tx S reads unconsistent data. > > Hmm, but that doesn't seem to have anything to do with the way that > GetSnapshotData operates. If Tx New has an XID >= xmax read by Tx S' > GetSnapshotData, then Tx New will be considered uncommitted by S no > matter which order we get the locks in; it hardly matters whether Tx New You forget about Tx Old! The point is that changes made by Tx Old *over* Tx New' changes effectively make those Tx New' changes *visible* to Tx S! And this is not good: Tx New inserts PK and corresponding FK and commits; Tx Old changes some field in row with that FK and commits - now Tx S will see FK row *but not PK one* (and what if Tx S was serializable Tx run by pd_dump...) SInval lock prevents Tx Old from commit (xact.c:CommitTransaction()) in points 2. - 4. above and so Tx Old' changes will not be visible to Tx S. > manages to physically commit before we finish building the snapshot for > S. On the other side of the coin, if Tx New's XID < xmax for S, then > *with the GetNewTransactionId change that I want* we can be sure that > Tx New will be seen running by S when it does get the SInval lock > (unless New has managed to finish before S gets the lock, in which case > it's perfectly reasonable for S to treat it as committed or aborted). And this is how it worked (MyProc->xid was updated while holding XXXGenLockId) in varsup.c from version 1.21 (Jun 1999) till version 1.36 (Mar 2001) when you occasionally moved it outside of locked code part: http://www.ca.postgresql.org/cgi/cvsweb.cgi/pgsql/src/backend/access/transam /varsup.c.diff?r1=1.35&r2=1.36 > Anyway, it seems to me that the possibility of inconsistent data is > inherent in the way we handle updated rows in Read Committed mode --- > you can always get to see a row that was emitted by a transaction you > don't see the other effects of. If I correctly understand meaning of "emitted" then sentence above is not correct: set of rows to be updated can only be shortened by concurrent transactions. Yes, changes can be made over changes from concurrent transactions but only for rows from original set defined by query snapshot and only if concurrently updated rows (from that set) satisfy query qual => a row must satisfy snapshot *and* query qual = double satisfaction guaranteed -:)) And let's remember that this behaviour is required for current RI constraints implementation. Vadim
"Vadim Mikheev" <vmikheev@sectorbase.com> writes: > You forget about Tx Old! The point is that changes made by Tx Old *over* > Tx New' changes effectively make those Tx New' changes *visible* to > Tx S! Yes, but what's that got to do with the order of operations in GetSnapshotData? The scenario you describe can occur anyway. Only if Tx Old is running in Read Committed mode, of course. But if it is, then it's capable of deciding to update a row updated by Tx New. Whether Tx S's xmax value is before or after Tx New's ID is not going to change the behavior of Tx Old. > And this is how it worked (MyProc->xid was updated while holding > XXXGenLockId) in varsup.c from version 1.21 (Jun 1999) till > version 1.36 (Mar 2001) when you occasionally moved it outside > of locked code part: Okay, so that part was my error. I've changed it back. I'd still like to change GetSnapshotData to read the nextXid before it acquires SInvalLock, though. If we did that, it'd be safe to make GetNewTransactionId be SpinAcquire(XidGenLockId);xid = nextXid++;SpinAcquire(SInvalLockId);MyProc->xid = xid;SpinRelease(SInvalLockId);SpinRelease(XidGenLockId); which is really necessary if you want to avoid assuming that TransactionIds can be fetched and stored atomically. Two other changes I think are needed in this area: * In AbortTransaction, the clearing of MyProc->xid and MyProc->xmin should be moved down to after RecordTransactionAbort and surrounded by acquire/release SInvalLock (to avoid atomic fetch/store assumption). * In HeapTupleSatisfiesVacuum (new tqual.c routine I just made yesterday, by extracting the tuple time qual checks from vacuum.c), the order of checking for process status should be TransactionIdIsInProgress TransactionIdDidCommit TransactionIdDidAbort not the present TransactionIdDidAbort TransactionIdDidCommit TransactionIdIsInProgress The current way is wrong because if the other process is just in process of committing, we can get VACUUM other TransactionIdDidAbort - no TransactionIdDidCommit - no RecordTransactionCommit(); MyProc->xid = 0; TransactionIdIsInProgress - no whereupon vacuum decides that the other process crashed --- oops. If we test TransactionIdIsInProgress *first* in tqual, and xact.c records commit or abort *before* clearing MyProc->xid, then we cannot have this race condition where the xact is no longer considered in progress but not seen to be committed/aborted either. (Note: this bug is not a problem for existing VACUUM, since it can never see any tuples from open transactions anyway. But it will be fatal for concurrent VACUUM.) Comments? regards, tom lane
> > You forget about Tx Old! The point is that changes made by > > Tx Old *over* Tx New' changes effectively make those Tx New' > > changes *visible* to Tx S! > > Yes, but what's that got to do with the order of operations in > GetSnapshotData? The scenario you describe can occur anyway. Try to describe it step by step. > Only if Tx Old is running in Read Committed mode, of course. > But if it is, then it's capable of deciding to update a row updated > by Tx New. Whether Tx S's xmax value is before or after Tx New's ID > is not going to change the behavior of Tx Old. 1. I consider particular case when Tx S' xmax is before Tx New' ID. 1.1 For this case acquiring SInval lock before ReadNewTransactionId() changes behavior of Tx Old: it postpones change ofTx Old' (and Tx New') MyProc->xid in xact.c:CommitTransaction(), so Tx S will see Tx Old as running, ie Tx Old' changeswill be invisible to Tx S on the base of analyzing MyProc.xid-s, just like Tx New' changes will be invisible onthe base of analyzing next Tx ID. 2. If you can find examples when current code is not able to provide consistent snapshot of running (out of interest)transactions let's think how to fix code. Untill then my example shows why we cannot move SInval lock requestafter ReadNewTransactionId(). > I'd still like to change GetSnapshotData to read the nextXid before > it acquires SInvalLock, though. If we did that, it'd be safe to make > GetNewTransactionId be > > SpinAcquire(XidGenLockId); > xid = nextXid++; > SpinAcquire(SInvalLockId); > MyProc->xid = xid; > SpinRelease(SInvalLockId); > SpinRelease(XidGenLockId); > > which is really necessary if you want to avoid assuming that > TransactionIds can be fetched and stored atomically. To avoid that assumption one should add per MyProc spinlock. Vadim
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes: > 1.1 For this case acquiring SInval lock before ReadNewTransactionId() > changes behavior of Tx Old: it postpones change of Tx Old' > (and Tx New') MyProc->xid in xact.c:CommitTransaction(), so Tx S > will see Tx Old as running, ie Tx Old' changes will be invisible > to Tx S on the base of analyzing MyProc.xid-s, just like Tx New' > changes will be invisible on the base of analyzing next Tx ID. Oh, now I get it: the point is to prevent Tx Old from exiting the set of "still running" xacts as seen by Tx S. Okay, it makes sense. I'll try to add some documentation to explain it. Given this, I'm wondering why we bother with having a separate XidGenLock spinlock at all. Why not eliminate it and use SInval spinlock to lock GetNewTransactionId and ReadNewTransactionId? What did you think about reordering the vacuum qual tests and AbortTransaction sequence? BTW, I'm starting to think that it would be really nice if we could replace our spinlocks with not just a semaphore, but something that has a notion of "shared" and "exclusive" lock requests. For example, if GetSnapshotData could use a shared lock on SInvalLock, it'd improve concurrency. regards, tom lane
> Oh, now I get it: the point is to prevent Tx Old from exiting the set > of "still running" xacts as seen by Tx S. Okay, it makes sense. > I'll try to add some documentation to explain it. TIA! I had no time from '99 -:) > Given this, I'm wondering why we bother with having a separate > XidGenLock spinlock at all. Why not eliminate it and use SInval > spinlock to lock GetNewTransactionId and ReadNewTransactionId? Reading all MyProc in GetSnashot may take long time - why disallow new Tx to begin. > What did you think about reordering the vacuum qual tests and > AbortTransaction sequence? Sorry, no time at the moment. > BTW, I'm starting to think that it would be really nice if we could > replace our spinlocks with not just a semaphore, but something that has > a notion of "shared" and "exclusive" lock requests. For example, > if GetSnapshotData could use a shared lock on SInvalLock, it'd > improve concurrency. Yes, we already told about light lock manager (no deadlock detection etc). Vadim
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes: >> Given this, I'm wondering why we bother with having a separate >> XidGenLock spinlock at all. Why not eliminate it and use SInval >> spinlock to lock GetNewTransactionId and ReadNewTransactionId? > Reading all MyProc in GetSnashot may take long time - why disallow > new Tx to begin. Because we need to synchronize? It bothers me that we're assuming that fetching/storing XIDs is atomic. There's no possibility at all of going to 8-byte XIDs as long as the code is like this. I doubt that a spinlock per PROC structure would be a better answer, either; the overhead of getting and releasing each lock would be nontrivial, considering the small number of instructions spent at each PROC in these routines. regards, tom lane
> > Isn't spinlock just a few ASM instructions?... on most platforms... > > If we change over to something that supports read vs write locking, > it's probably going to be rather more than that ... right now, I'm > pretty dissatisfied with the performance of our spinlocks under load. We shouldn't use light locks everywhere. Updating/reading MyProc.xid is very good place to use simple spinlocks... or even better mutexes. Vadim
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes: > Isn't spinlock just a few ASM instructions?... on most platforms... If we change over to something that supports read vs write locking, it's probably going to be rather more than that ... right now, I'm pretty dissatisfied with the performance of our spinlocks under load. regards, tom lane
> >> Given this, I'm wondering why we bother with having a separate > >> XidGenLock spinlock at all. Why not eliminate it and use SInval > >> spinlock to lock GetNewTransactionId and ReadNewTransactionId? > > > Reading all MyProc in GetSnashot may take long time - why disallow > > new Tx to begin. > > Because we need to synchronize? It bothers me that we're assuming > that fetching/storing XIDs is atomic. There's no possibility at all > of going to 8-byte XIDs as long as the code is like this. > > I doubt that a spinlock per PROC structure would be a better answer, > either; the overhead of getting and releasing each lock would be > nontrivial, considering the small number of instructions spent at > each PROC in these routines. Isn't spinlock just a few ASM instructions?... on most platforms... Vadim