Re: Race condition in HEAD, possibly due to PGPROC splitup - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Race condition in HEAD, possibly due to PGPROC splitup
Date
Msg-id CA+U5nMKWxG5KH7Nfuco==MSyBrCbJ=5ZRRVqYz82=nkpf=7NAA@mail.gmail.com
Whole thread Raw
In response to Re: Race condition in HEAD, possibly due to PGPROC splitup  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Dec 15, 2011 at 2:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Dec 15, 2011 at 8:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Comments?
>
> I think there is a small bug here:
>
> +                       TransactionId xid = pgxact->xid;
> +
> +                       /*
> +                        * Don't record locks for transactions if we know they have already
> +                        * issued their WAL record for commit but not yet released lock.
> +                        * It is still possible that we see locks held by already complete
> +                        * transactions, if they haven't yet zeroed their xids.
> +                        */
> +                       if (!TransactionIdIsValid(xid))
> +                               continue;
>
>                        accessExclusiveLocks[index].xid = pgxact->xid;
>                        accessExclusiveLocks[index].dbOid = lock->tag.locktag_field1;
>
> ...because you're fetching pgxact->xid, testing whether it's valid,
> and then fetching it again.  It could change in the meantime.  You
> probably want to change the assignment to read:
>
>                        accessExclusiveLocks[index].xid = xid;
>
> Also, we should probably change this pointer to be declared volatile:
>
>                        PGXACT     *pgxact = &ProcGlobal->allPgXact[proc->pgprocno];
>
> Otherwise, I think the compiler might get cute and decide to fetch the
> xid twice anyway, effectively undoing our attempt to pull it into a
> local variable.
>
> I think this comment could be clarified in some way, to make it more
> clear that we had a bug at one point, and it was fixed - the "from a
> time when they were possible" language wouldn't be entirely clear to
> me after the fact:
>
> +  * Zero xids should no longer be possible, but we may be replaying WAL
> +  * from a time when they were possible.
>
> It would probably make sense to break out of this loop if a match is found:
>
> !                       for (i = 0; i < nxids; i++)
> !                       {
> !                               if (lock->xid == xids[i])
> !                                       found = true;
> !                       }
>
> I'm not sure I fully understand the rest of this, but those are the
> only things I noticed on a read-through.

Thanks, useful changes.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: unite recovery.conf and postgresql.conf
Next
From: Andres Freund
Date:
Subject: Re: Command Triggers