Thread: [HACKERS] A misconception about the meaning of 'volatile' in GetNewTransactionId?
[HACKERS] A misconception about the meaning of 'volatile' in GetNewTransactionId?
From
Thomas Munro
Date:
Hi hackers, I was reading xact.c and noticed this block: /* * Use volatile pointer to prevent code rearrangement; other backends * could be examining my subxids info concurrently, and we don't want * them to see an invalid intermediate state, such as incrementing * nxids before filling the array entry. Note we are assuming that * TransactionId and int fetch/store are atomic. */ volatile PGPROC*myproc = MyProc; volatile PGXACT *mypgxact = MyPgXact; if (!isSubXact) mypgxact->xid = xid; else { int nxids = mypgxact->nxids; if (nxids < PGPROC_MAX_CACHED_SUBXIDS) { myproc->subxids.xids[nxids]= xid; mypgxact->nxids = nxids + 1; Isn't this insufficient on non-TSO systems like POWER and Arm? It uses volatile qualifiers as a compiler barrier, which is probably enough for x86 and Sparc in TSO mode, but doesn't include a memory barrier to prevent hardware reordering. I think the thing to do here would be to forget about volatile, stick pg_write_barrier() between those two writes, and stick pg_read_barrier() between the reads in any code that might read nxids and then scan xids concurrently, such as TransactionIdIsInProgress(). This is almost exactly the example from the section "Avoiding Memory Order Bugs" in src/backend/storage/lmgr/README.barrier. Thoughts? -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] A misconception about the meaning of 'volatile' in GetNewTransactionId?
From
Tom Lane
Date:
Thomas Munro <thomas.munro@enterprisedb.com> writes: > I was reading xact.c and noticed this block: > ... > Isn't this insufficient on non-TSO systems like POWER and Arm? Yeah, I think you're right. That code probably predates our support for memory barriers, so "volatile" was the best we could do at the time --- but as you say, it doesn't fix hardware-level rearrangements. regards, tom lane
Re: [HACKERS] A misconception about the meaning of 'volatile' in GetNewTransactionId?
From
Thomas Munro
Date:
On Sun, Apr 30, 2017 at 1:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: >> I was reading xact.c and noticed this block: >> ... >> Isn't this insufficient on non-TSO systems like POWER and Arm? > > Yeah, I think you're right. That code probably predates our support > for memory barriers, so "volatile" was the best we could do at the > time --- but as you say, it doesn't fix hardware-level rearrangements. Here is an experimental patch, for discussion only, to drop some apparently useless volatile qualifiers and introduce a write barrier when extending the array and a corresponding read barrier when scanning or copying the array from other processes. I wonder about this code that shrinks the array: #define XidCacheRemove(i) \ do { \ MyProc->subxids.xids[i] = MyProc->subxids.xids[MyPgXact->nxids - 1]; \ MyPgXact->nxids--; \ } while (0) If a concurrent process saw the decremented nxids value before seeing the effect of xids[i] = xids[final], then it would miss an arbitrary running subtransaction (not the aborting subtransaction being removed from the array, but whichever xid had the bad luck to be in final position). In the patch I added pg_write_barrier(), but I suspect that that might be not really a problem because of higher level interlocking that I'm missing, because this code makes no mention of the problem and doesn't (ab)use volatile qualifiers like the code that extends the array (so it has neither compiler barrier/volatile nor memory barrier so could be broken even on TSO assumptions at the whim of the compiler if my guess were right about that). -- Thomas Munro http://www.enterprisedb.com