Thread: Erroneous PPC spinlock code
The SuSE PPC guru said that the PPC spinlock code we currently use may behave erroneously on multiprocessor systems. Attached is the proposed patch, suggested for inclusion in 7.4. Comments? -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes: > The SuSE PPC guru said that the PPC spinlock code we currently use may > behave erroneously on multiprocessor systems. What's his evidence for that claim? The code we have is based directly on the recommendations in the PPC manuals, and has been tested on multi-CPU systems. regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes: > The SuSE PPC guru said that the PPC spinlock code we currently use may > behave erroneously on multiprocessor systems. Attached is the proposed > patch, suggested for inclusion in 7.4. Comments? I looked into this some more. The current CVS tip is identical to the TAS code used in 7.3.* except for being gcc inlined assembler instead of an out-of-line subroutine. The 7.3 code is known to work, cf http://archives.postgresql.org/pgsql-bugs/2002-09/msg00252.php Given that testing and the lack of any bug reports against 7.3.*, I think the burden of proof is on the person who thinks we should change it. AFAICS the proposed change for TAS() simply amounts to reversing the sense of the test following stwcx., so that the "success" path corresponds to "branch not taken" rather than "branch taken". I cannot see anything in the IBM PPC architecture manual http://www-3.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF778525699600682CC7 to justify thinking that this changes anything. If there is any difference in behavior then I'd think that having the isync in the forward branch path is safer than not. The TAS example in the manual (p. 398) looks like loop: lwarx ... ... stwcx. ... bne loop isync which might be read as saying that the isync should be in the "fall through" path, but I think it is more correctly read as putting the isync in the "not predicted to be taken" path. Branch backward will be statically predicted to be taken, branch forward not. In any case there's no mention here of needing to code the branch in one particular way. The proposed change from sync to lwsync during S_UNLOCK is interesting, but we have to keep in mind that that is *not* a bug fix but an attempt at performance improvement --- lwsync is a weaker constraint than sync. I am not convinced that this change is safe for our usage, and I think it would be folly to stick it into 7.4 at the RC stage of the cycle. In short, my vote is "show me why" for the TAS change, and "no way for 7.4, but we can look at it later" for the S_UNLOCK change. regards, tom lane
On Thu, Nov 06, 2003 at 12:08:56AM +0100, Reinhard Max wrote: > On Wed, 5 Nov 2003 at 13:28, Tom Lane wrote: > > > Peter Eisentraut <peter_e@gmx.net> writes: > > > > > The SuSE PPC guru said that the PPC spinlock code we currently use > > > may behave erroneously on multiprocessor systems. > > > > What's his evidence for that claim? > > Let's ask himself. > > > The code we have is based directly on the recommendations in the PPC > > manuals, and has been tested on multi-CPU systems. > > Marcus, can you explain the details, please? I reviewed the documentation again (at:http://www-1.ibm.com/servers/esdd/articles/powerpc.html ) and it seems to agree with your opinion. I retract my comment, leave your code as-is. Ciao, Marcus
On Wed, 5 Nov 2003 at 13:28, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > > The SuSE PPC guru said that the PPC spinlock code we currently use > > may behave erroneously on multiprocessor systems. > > What's his evidence for that claim? Let's ask himself. > The code we have is based directly on the recommendations in the PPC > manuals, and has been tested on multi-CPU systems. Marcus, can you explain the details, please? cuReinhard
Marcus Meissner wrote: -- Start of PGP signed section. > On Thu, Nov 06, 2003 at 12:08:56AM +0100, Reinhard Max wrote: > > On Wed, 5 Nov 2003 at 13:28, Tom Lane wrote: > > > > > Peter Eisentraut <peter_e@gmx.net> writes: > > > > > > > The SuSE PPC guru said that the PPC spinlock code we currently use > > > > may behave erroneously on multiprocessor systems. > > > > > > What's his evidence for that claim? > > > > Let's ask himself. > > > > > The code we have is based directly on the recommendations in the PPC > > > manuals, and has been tested on multi-CPU systems. > > > > Marcus, can you explain the details, please? > > I reviewed the documentation again (at: > http://www-1.ibm.com/servers/esdd/articles/powerpc.html > ) and it seems to agree with your opinion. > > I retract my comment, leave your code as-is. Cool. Thanks for checking. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073