Thread: Race condition in HEAD, possibly due to PGPROC splitup
If you add this Assert to lock.c: diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 3ba4671..d9c15e0 100644 *** a/src/backend/storage/lmgr/lock.c --- b/src/backend/storage/lmgr/lock.c *************** GetRunningTransactionLocks(int *nlocks) *** 3195,3200 **** --- 3195,3202 ---- accessExclusiveLocks[index].dbOid = lock->tag.locktag_field1; accessExclusiveLocks[index].relOid= lock->tag.locktag_field2; + Assert(TransactionIdIsNormal(accessExclusiveLocks[index].xid)); + index++; } } then set wal_level = hot_standby, and run the regression tests repeatedly, the Assert will trigger eventually --- for me, it happens within a dozen or so parallel iterations, or rather longer if I run the tests serial style. Stack trace is unsurprising, since AFAIK this is only called in the checkpointer: #2 0x000000000073461d in ExceptionalCondition ( conditionName=<value optimized out>, errorType=<value optimized out>, fileName=<value optimized out>, lineNumber=<value optimized out>) at assert.c:57 #3 0x000000000065eca1 in GetRunningTransactionLocks (nlocks=0x7fffa997de8c) at lock.c:3198 #4 0x00000000006582b8 in LogStandbySnapshot (nextXid=0x7fffa997dee0) at standby.c:835 #5 0x00000000004b0b97 in CreateCheckPoint (flags=32) at xlog.c:7761 #6 0x000000000062bf92 in CheckpointerMain () at checkpointer.c:488 #7 0x00000000004cf465 in AuxiliaryProcessMain (argc=2, argv=0x7fffa997e110) at bootstrap.c:424 #8 0x00000000006261f5 in StartChildProcess (type=CheckpointerProcess) at postmaster.c:4487 The actual value of the bogus xid (which was pulled from allPgXact[proc->pgprocno]->xid just above here) is zero. What I believe is happening is that somebody is clearing his pgxact->xid entry asynchronously to GetRunningTransactionLocks, and since that clearly oughta be impossible, something is broken. Without the added assert, you'd only notice this if you were running a standby slave --- the zero xid results in an assert failure in WAL replay on the slave end, which is how I found out about this to start with. But since we've not heard reports of such before, I suspect that this is a recently introduced bug; and personally I'd bet money that it was the PGXACT patch that broke it. I have other things to do than look into this right now myself. regards, tom lane
On Wed, Dec 14, 2011 at 4:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Without the added assert, you'd only notice this if you were running a > standby slave --- the zero xid results in an assert failure in WAL > replay on the slave end, which is how I found out about this to start > with. But since we've not heard reports of such before, I suspect that > this is a recently introduced bug; and personally I'd bet money that it > was the PGXACT patch that broke it. > I can reproduce this and will look at it in detail. BTW, on an unrelated note, I was looking at the way ShmemInitStruct() is used. It internally uses ShmemAlloc to allocate from shared memory. ShmemAlloc always MAXALIGN the requested size. But while calculating the total shared memory requirement, we don't always MAXALIGN individual structure sizes. One example is KnownAssignedXidsValid, but I am sure there are other places where the originally computed and the requested sizes could be different because of alignment. I wonder if we are just lucky not to hit shared memory size mismatch, may be because we round up to the block size at the end. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
On Wed, Dec 14, 2011 at 12:20 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > On Wed, Dec 14, 2011 at 4:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> >> Without the added assert, you'd only notice this if you were running a >> standby slave --- the zero xid results in an assert failure in WAL >> replay on the slave end, which is how I found out about this to start >> with. But since we've not heard reports of such before, I suspect that >> this is a recently introduced bug; and personally I'd bet money that it >> was the PGXACT patch that broke it. >> > > I can reproduce this and will look at it in detail. > Looking at CommitTransaction(), it seems quite clear to me that we call ProcArrayEndTransaction() before releasing the locks held by the transaction. So its quite possible that when GetRunningTransactionLocks goes through the list of currently held locks, the pgxact->xid is already cleared. This seems to a old bug to me and not related to PGXACT work. In fact, I can force the assertion failure by braking into gdb and pausing the process running the following statements, right after it clears the xid by calling ProcArrayEndTransaction(). At that point, if I hit CHECKPOINT from another session, the assertion fails. Session 1: BEGIN; LOCK TABLE test IN ACCESS EXCLUSIVE MODE; COMMIT; ==> break after ProcArrayEndTransaction finishes Session 2: CHECKPOINT; Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Pavan Deolasee <pavan.deolasee@gmail.com> writes: > BTW, on an unrelated note, I was looking at the way ShmemInitStruct() > is used. It internally uses ShmemAlloc to allocate from shared memory. > ShmemAlloc always MAXALIGN the requested size. But while calculating > the total shared memory requirement, we don't always MAXALIGN > individual structure sizes. One example is KnownAssignedXidsValid, but > I am sure there are other places where the originally computed and the > requested sizes could be different because of alignment. > I wonder if we are just lucky not to hit shared memory size mismatch, > may be because we round up to the block size at the end. That sort of thing is down in the noise. One reason we throw in the 100KB slop is so we don't have to sweat details like that. regards, tom lane
Pavan Deolasee <pavan.deolasee@gmail.com> writes: > Looking at CommitTransaction(), it seems quite clear to me that we > call ProcArrayEndTransaction() before releasing the locks held by the > transaction. So its quite possible that when > GetRunningTransactionLocks goes through the list of currently held > locks, the pgxact->xid is already cleared. This seems to a old bug to > me and not related to PGXACT work. Hm. So maybe the correct fix is to deem the lock already released if we get zero when we read the xid? It's not clear to me what the requirements for GetRunningTransactionLocks actually are, but if it's okay for it to think a lock is released slightly ahead of when the rest of the system thinks so, that would work. regards, tom lane
On Wed, Dec 14, 2011 at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Pavan Deolasee <pavan.deolasee@gmail.com> writes: >> Looking at CommitTransaction(), it seems quite clear to me that we >> call ProcArrayEndTransaction() before releasing the locks held by the >> transaction. So its quite possible that when >> GetRunningTransactionLocks goes through the list of currently held >> locks, the pgxact->xid is already cleared. This seems to a old bug to >> me and not related to PGXACT work. > > Hm. So maybe the correct fix is to deem the lock already released > if we get zero when we read the xid? It's not clear to me what the > requirements for GetRunningTransactionLocks actually are, but if it's > okay for it to think a lock is released slightly ahead of when the > rest of the system thinks so, that would work. OK, I'll look at this. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Dec 14, 2011 at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Pavan Deolasee <pavan.deolasee@gmail.com> writes: >> Looking at CommitTransaction(), it seems quite clear to me that we >> call ProcArrayEndTransaction() before releasing the locks held by the >> transaction. So its quite possible that when >> GetRunningTransactionLocks goes through the list of currently held >> locks, the pgxact->xid is already cleared. This seems to a old bug to >> me and not related to PGXACT work. > > Hm. So maybe the correct fix is to deem the lock already released > if we get zero when we read the xid? It's not clear to me what the > requirements for GetRunningTransactionLocks actually are, but if it's > okay for it to think a lock is released slightly ahead of when the > rest of the system thinks so, that would work. Attached patch closes both race conditions: * where xid is zero * where xid is non-zero yet WAL record for the commit of xid wins race ahead of the WAL record for locks Patch fixes it in backwards compatible way. No version increments. Patch head, 9.1 and 9.0. Will wait a couple of days for additional testing. Comments? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Thu, Dec 15, 2011 at 8:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Comments? I think there is a small bug here: + TransactionId xid = pgxact->xid; + + /* + * Don't record locks for transactions if we know they have already + * issued their WAL record for commit but not yet released lock. + * It is still possible that we see locks held by already complete + * transactions, if they haven't yet zeroed their xids. + */ + if (!TransactionIdIsValid(xid)) + continue; accessExclusiveLocks[index].xid = pgxact->xid; accessExclusiveLocks[index].dbOid = lock->tag.locktag_field1; ...because you're fetching pgxact->xid, testing whether it's valid, and then fetching it again. It could change in the meantime. You probably want to change the assignment to read: accessExclusiveLocks[index].xid = xid; Also, we should probably change this pointer to be declared volatile: PGXACT *pgxact = &ProcGlobal->allPgXact[proc->pgprocno]; Otherwise, I think the compiler might get cute and decide to fetch the xid twice anyway, effectively undoing our attempt to pull it into a local variable. I think this comment could be clarified in some way, to make it more clear that we had a bug at one point, and it was fixed - the "from a time when they were possible" language wouldn't be entirely clear to me after the fact: + * Zero xids should no longer be possible, but we may be replaying WAL + * from a time when they were possible. It would probably make sense to break out of this loop if a match is found: ! for (i = 0; i < nxids; i++) ! { ! if (lock->xid == xids[i]) ! found = true; ! } I'm not sure I fully understand the rest of this, but those are the only things I noticed on a read-through. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Dec 15, 2011 at 2:13 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Dec 15, 2011 at 8:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Comments? > > I think there is a small bug here: > > + TransactionId xid = pgxact->xid; > + > + /* > + * Don't record locks for transactions if we know they have already > + * issued their WAL record for commit but not yet released lock. > + * It is still possible that we see locks held by already complete > + * transactions, if they haven't yet zeroed their xids. > + */ > + if (!TransactionIdIsValid(xid)) > + continue; > > accessExclusiveLocks[index].xid = pgxact->xid; > accessExclusiveLocks[index].dbOid = lock->tag.locktag_field1; > > ...because you're fetching pgxact->xid, testing whether it's valid, > and then fetching it again. It could change in the meantime. You > probably want to change the assignment to read: > > accessExclusiveLocks[index].xid = xid; > > Also, we should probably change this pointer to be declared volatile: > > PGXACT *pgxact = &ProcGlobal->allPgXact[proc->pgprocno]; > > Otherwise, I think the compiler might get cute and decide to fetch the > xid twice anyway, effectively undoing our attempt to pull it into a > local variable. > > I think this comment could be clarified in some way, to make it more > clear that we had a bug at one point, and it was fixed - the "from a > time when they were possible" language wouldn't be entirely clear to > me after the fact: > > + * Zero xids should no longer be possible, but we may be replaying WAL > + * from a time when they were possible. > > It would probably make sense to break out of this loop if a match is found: > > ! for (i = 0; i < nxids; i++) > ! { > ! if (lock->xid == xids[i]) > ! found = true; > ! } > > I'm not sure I fully understand the rest of this, but those are the > only things I noticed on a read-through. Thanks, useful changes. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services