Thread: Race condition in HEAD, possibly due to PGPROC splitup

Race condition in HEAD, possibly due to PGPROC splitup

From

Tom Lane

Date:

14 December 2011, 00:15:52

If you add this Assert to lock.c:

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 3ba4671..d9c15e0 100644
*** a/src/backend/storage/lmgr/lock.c
--- b/src/backend/storage/lmgr/lock.c
*************** GetRunningTransactionLocks(int *nlocks)
*** 3195,3200 ****
--- 3195,3202 ----             accessExclusiveLocks[index].dbOid = lock->tag.locktag_field1;
accessExclusiveLocks[index].relOid= lock->tag.locktag_field2; 
 
+             Assert(TransactionIdIsNormal(accessExclusiveLocks[index].xid));
+              index++;         }     }

then set wal_level = hot_standby, and run the regression tests
repeatedly, the Assert will trigger eventually --- for me, it happens
within a dozen or so parallel iterations, or rather longer if I run
the tests serial style.  Stack trace is unsurprising, since AFAIK this
is only called in the checkpointer:

#2  0x000000000073461d in ExceptionalCondition (   conditionName=<value optimized out>, errorType=<value optimized
out>,   fileName=<value optimized out>, lineNumber=<value optimized out>)   at assert.c:57
 
#3  0x000000000065eca1 in GetRunningTransactionLocks (nlocks=0x7fffa997de8c)   at lock.c:3198
#4  0x00000000006582b8 in LogStandbySnapshot (nextXid=0x7fffa997dee0)   at standby.c:835
#5  0x00000000004b0b97 in CreateCheckPoint (flags=32) at xlog.c:7761
#6  0x000000000062bf92 in CheckpointerMain () at checkpointer.c:488
#7  0x00000000004cf465 in AuxiliaryProcessMain (argc=2, argv=0x7fffa997e110)   at bootstrap.c:424
#8  0x00000000006261f5 in StartChildProcess (type=CheckpointerProcess)   at postmaster.c:4487

The actual value of the bogus xid (which was pulled from
allPgXact[proc->pgprocno]->xid just above here) is zero.  What I believe
is happening is that somebody is clearing his pgxact->xid entry
asynchronously to GetRunningTransactionLocks, and since that clearly
oughta be impossible, something is broken.

Without the added assert, you'd only notice this if you were running a
standby slave --- the zero xid results in an assert failure in WAL
replay on the slave end, which is how I found out about this to start
with.  But since we've not heard reports of such before, I suspect that
this is a recently introduced bug; and personally I'd bet money that it
was the PGXACT patch that broke it.

I have other things to do than look into this right now myself.
        regards, tom lane

Re: Race condition in HEAD, possibly due to PGPROC splitup

From

Pavan Deolasee

Date:

14 December 2011, 08:21:17

On Wed, Dec 14, 2011 at 4:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>
> Without the added assert, you'd only notice this if you were running a
> standby slave --- the zero xid results in an assert failure in WAL
> replay on the slave end, which is how I found out about this to start
> with.  But since we've not heard reports of such before, I suspect that
> this is a recently introduced bug; and personally I'd bet money that it
> was the PGXACT patch that broke it.
>

I can reproduce this and will look at it in detail.

BTW, on an unrelated note, I was looking at the way ShmemInitStruct()
is used. It internally uses ShmemAlloc to allocate from shared memory.
ShmemAlloc always MAXALIGN the requested size. But while calculating
the total shared memory requirement, we don't always MAXALIGN
individual structure sizes. One example is KnownAssignedXidsValid, but
I am sure there are other places where the originally computed and the
requested sizes could be different because of alignment.

I wonder if we are just lucky not to hit shared memory size mismatch,
may be because we round up to the block size at the end.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

Re: Race condition in HEAD, possibly due to PGPROC splitup

From

Pavan Deolasee

Date:

14 December 2011, 09:12:13

On Wed, Dec 14, 2011 at 12:20 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
> On Wed, Dec 14, 2011 at 4:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>>
>> Without the added assert, you'd only notice this if you were running a
>> standby slave --- the zero xid results in an assert failure in WAL
>> replay on the slave end, which is how I found out about this to start
>> with.  But since we've not heard reports of such before, I suspect that
>> this is a recently introduced bug; and personally I'd bet money that it
>> was the PGXACT patch that broke it.
>>
>
> I can reproduce this and will look at it in detail.
>

Looking at CommitTransaction(), it seems quite clear to me that we
call ProcArrayEndTransaction() before releasing the locks held by the
transaction. So its quite possible that when
GetRunningTransactionLocks goes through the list of currently held
locks, the pgxact->xid is already cleared. This seems to a old bug to
me and not related to PGXACT work.

In fact, I can force the assertion failure by braking into gdb and
pausing the process running the following statements, right after it
clears the xid by calling ProcArrayEndTransaction(). At that point, if
I hit CHECKPOINT from another session, the assertion fails.

Session 1:
BEGIN;
LOCK TABLE test IN ACCESS EXCLUSIVE MODE;
COMMIT; ==> break after ProcArrayEndTransaction finishes

Session 2:
CHECKPOINT;

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

Re: Race condition in HEAD, possibly due to PGPROC splitup

From

Tom Lane

Date:

14 December 2011, 11:17:16

Pavan Deolasee <pavan.deolasee@gmail.com> writes:
> BTW, on an unrelated note, I was looking at the way ShmemInitStruct()
> is used. It internally uses ShmemAlloc to allocate from shared memory.
> ShmemAlloc always MAXALIGN the requested size. But while calculating
> the total shared memory requirement, we don't always MAXALIGN
> individual structure sizes. One example is KnownAssignedXidsValid, but
> I am sure there are other places where the originally computed and the
> requested sizes could be different because of alignment.

> I wonder if we are just lucky not to hit shared memory size mismatch,
> may be because we round up to the block size at the end.

That sort of thing is down in the noise.  One reason we throw in the
100KB slop is so we don't have to sweat details like that.
        regards, tom lane

Re: Race condition in HEAD, possibly due to PGPROC splitup

From

Tom Lane

Date:

14 December 2011, 11:20:46

Pavan Deolasee <pavan.deolasee@gmail.com> writes:
> Looking at CommitTransaction(), it seems quite clear to me that we
> call ProcArrayEndTransaction() before releasing the locks held by the
> transaction. So its quite possible that when
> GetRunningTransactionLocks goes through the list of currently held
> locks, the pgxact->xid is already cleared. This seems to a old bug to
> me and not related to PGXACT work.

Hm.  So maybe the correct fix is to deem the lock already released
if we get zero when we read the xid?  It's not clear to me what the
requirements for GetRunningTransactionLocks actually are, but if it's
okay for it to think a lock is released slightly ahead of when the
rest of the system thinks so, that would work.
        regards, tom lane

Re: Race condition in HEAD, possibly due to PGPROC splitup

From

Simon Riggs

Date:

14 December 2011, 15:49:26

On Wed, Dec 14, 2011 at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Pavan Deolasee <pavan.deolasee@gmail.com> writes:
>> Looking at CommitTransaction(), it seems quite clear to me that we
>> call ProcArrayEndTransaction() before releasing the locks held by the
>> transaction. So its quite possible that when
>> GetRunningTransactionLocks goes through the list of currently held
>> locks, the pgxact->xid is already cleared. This seems to a old bug to
>> me and not related to PGXACT work.
>
> Hm.  So maybe the correct fix is to deem the lock already released
> if we get zero when we read the xid?  It's not clear to me what the
> requirements for GetRunningTransactionLocks actually are, but if it's
> okay for it to think a lock is released slightly ahead of when the
> rest of the system thinks so, that would work.

OK, I'll look at this.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Race condition in HEAD, possibly due to PGPROC splitup

From

Simon Riggs

Date:

15 December 2011, 09:19:54

On Wed, Dec 14, 2011 at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Pavan Deolasee <pavan.deolasee@gmail.com> writes:
>> Looking at CommitTransaction(), it seems quite clear to me that we
>> call ProcArrayEndTransaction() before releasing the locks held by the
>> transaction. So its quite possible that when
>> GetRunningTransactionLocks goes through the list of currently held
>> locks, the pgxact->xid is already cleared. This seems to a old bug to
>> me and not related to PGXACT work.
>
> Hm.  So maybe the correct fix is to deem the lock already released
> if we get zero when we read the xid?  It's not clear to me what the
> requirements for GetRunningTransactionLocks actually are, but if it's
> okay for it to think a lock is released slightly ahead of when the
> rest of the system thinks so, that would work.

Attached patch closes both race conditions:
* where xid is zero
* where xid is non-zero yet WAL record for the commit of xid wins race
ahead of the WAL record for locks

Patch fixes it in backwards compatible way.

No version increments.

Patch head, 9.1 and 9.0.

Will wait a couple of days for additional testing.

Comments?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

hs_release_old_locks.v2.patch

Re: Race condition in HEAD, possibly due to PGPROC splitup

From

Robert Haas

Date:

15 December 2011, 10:13:42

On Thu, Dec 15, 2011 at 8:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Comments?

I think there is a small bug here:

+             TransactionId xid = pgxact->xid;
+
+             /*
+              * Don't record locks for transactions if we know they have already
+              * issued their WAL record for commit but not yet released lock.
+              * It is still possible that we see locks held by already complete
+              * transactions, if they haven't yet zeroed their xids.
+              */
+             if (!TransactionIdIsValid(xid))
+                 continue;
             accessExclusiveLocks[index].xid = pgxact->xid;             accessExclusiveLocks[index].dbOid =
lock->tag.locktag_field1;

...because you're fetching pgxact->xid, testing whether it's valid,
and then fetching it again.  It could change in the meantime.  You
probably want to change the assignment to read:
             accessExclusiveLocks[index].xid = xid;

Also, we should probably change this pointer to be declared volatile:
             PGXACT       *pgxact = &ProcGlobal->allPgXact[proc->pgprocno];

Otherwise, I think the compiler might get cute and decide to fetch the
xid twice anyway, effectively undoing our attempt to pull it into a
local variable.

I think this comment could be clarified in some way, to make it more
clear that we had a bug at one point, and it was fixed - the "from a
time when they were possible" language wouldn't be entirely clear to
me after the fact:

+  * Zero xids should no longer be possible, but we may be replaying WAL
+  * from a time when they were possible.

It would probably make sense to break out of this loop if a match is found:

!             for (i = 0; i < nxids; i++)
!             {
!                 if (lock->xid == xids[i])
!                     found = true;
!             }

I'm not sure I fully understand the rest of this, but those are the
only things I noticed on a read-through.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Race condition in HEAD, possibly due to PGPROC splitup

From

Simon Riggs

Date:

15 December 2011, 10:48:38

On Thu, Dec 15, 2011 at 2:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Dec 15, 2011 at 8:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Comments?
>
> I think there is a small bug here:
>
> +                       TransactionId xid = pgxact->xid;
> +
> +                       /*
> +                        * Don't record locks for transactions if we know they have already
> +                        * issued their WAL record for commit but not yet released lock.
> +                        * It is still possible that we see locks held by already complete
> +                        * transactions, if they haven't yet zeroed their xids.
> +                        */
> +                       if (!TransactionIdIsValid(xid))
> +                               continue;
>
>                        accessExclusiveLocks[index].xid = pgxact->xid;
>                        accessExclusiveLocks[index].dbOid = lock->tag.locktag_field1;
>
> ...because you're fetching pgxact->xid, testing whether it's valid,
> and then fetching it again.  It could change in the meantime.  You
> probably want to change the assignment to read:
>
>                        accessExclusiveLocks[index].xid = xid;
>
> Also, we should probably change this pointer to be declared volatile:
>
>                        PGXACT     *pgxact = &ProcGlobal->allPgXact[proc->pgprocno];
>
> Otherwise, I think the compiler might get cute and decide to fetch the
> xid twice anyway, effectively undoing our attempt to pull it into a
> local variable.
>
> I think this comment could be clarified in some way, to make it more
> clear that we had a bug at one point, and it was fixed - the "from a
> time when they were possible" language wouldn't be entirely clear to
> me after the fact:
>
> +  * Zero xids should no longer be possible, but we may be replaying WAL
> +  * from a time when they were possible.
>
> It would probably make sense to break out of this loop if a match is found:
>
> !                       for (i = 0; i < nxids; i++)
> !                       {
> !                               if (lock->xid == xids[i])
> !                                       found = true;
> !                       }
>
> I'm not sure I fully understand the rest of this, but those are the
> only things I noticed on a read-through.

Thanks, useful changes.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services