Re: pg_memory_barrier() doesn't compile, let alone work, for me - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: pg_memory_barrier() doesn't compile, let alone work, for me |
Date | |
Msg-id | CA+Tgmoakz1R73zYDg0HNeH8XAZ3dAPXpgwaNc6AhoPZ_CJ3WTg@mail.gmail.com Whole thread Raw |
In response to | pg_memory_barrier() doesn't compile, let alone work, for me (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: pg_memory_barrier() doesn't compile, let alone work,
for me
Re: pg_memory_barrier() doesn't compile, let alone work, for me |
List | pgsql-hackers |
On Sun, Jul 14, 2013 at 8:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Commit 9a20a9b2 breaks the build for me, using gcc on HPPA: > > xlog.c:2182: macro `pg_memory_barrier' used without args > xlog.c:2546: macro `pg_memory_barrier' used without args > make[4]: *** [xlog.o] Error 1 > > The reason for this is that the "fallback" definition of > pg_memory_barrier has been wrong since day one; it needs this fix: > > -#define pg_memory_barrier(x) \ > +#define pg_memory_barrier() \ Uggh, sorry. > However, fixing that doesn't yield much joy; initdb stalls and then > crashes with > > PANIC: stuck spinlock (40054a88) detected at xlog.c:2182 > > The reason for that is that the code does not bother to initialize > "dummy_spinlock" anywhere. It might accidentally fail to fail > on machines where the unlocked state of a spinlock is all-zeroes > (given a compiler that's not picky about the incorrect macro usage) > ... but HPPA is not such a machine. This would not be hard to fix, I think. > Rather than trying to think of a fix for that, I'm of the opinion that > we should rip this out. The fallback implementation of pg_memory_barrier > ought to be pg_compiler_barrier(), on the theory that non-mainstream > architectures don't have weak memory ordering anyway, or if they do you > need to do some work to get PG to work on them. Or maybe we ought to > stop pretending that the code is likely to work on arbitrary machines, > and just #error if there's not a supplied machine-specific macro. Well, pg_memory_barrier() isn't even equivalent to pg_compiler_barrier() on x86, which has among the strongest memory orderings out there, so I think your first idea is a non-starter. A compiler barrier on x86 will fence reads from reads and writes from writes, but it will not prevent a read from being speculated before a subsequent write, which a full barrier must do. I'm pretty sure we've got latent memory-ordering risks in our existing code which we just haven't detected and fixed yet. Consider, for example, this exciting code from GetNewTransactionId: myproc->subxids.xids[nxids] = xid; mypgxact->nxids = nxids +1; I don't believe that's technically safe even on an architecture like x86, because the compiler could decide to reorder those assignments. Of course there is probably no reason to do so, and even if it does you'd have to get really unlucky to see a user-visible failure, and if you did you'd probably misguess the cause. However, I'm guessing that as we start using memory barriers more, and as we speed up the system generally, we're going to see bugs like this that are currently hidden start to become visible from time to time. Especially in view of that, I think it's wise to be pessimistic rather than optimistic about what barriers are needed, when we don't know for certain. That way, when we get a memory-ordering-related bug report, we can at least hope that the problem must be something specific to the problem area rather than a general failure to use the right memory barrier primitives. My preference would be to fix this in a narrow way, by initializing dummy_spinlock somewhere. But if not, then I think #error is the only safe way to go. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: