Thread: Erroneous PPC spinlock code

Erroneous PPC spinlock code

From
Peter Eisentraut
Date:
The SuSE PPC guru said that the PPC spinlock code we currently use may
behave erroneously on multiprocessor systems.  Attached is the proposed
patch, suggested for inclusion in 7.4.  Comments?

-- 
Peter Eisentraut   peter_e@gmx.net

Re: Erroneous PPC spinlock code

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> The SuSE PPC guru said that the PPC spinlock code we currently use may
> behave erroneously on multiprocessor systems.

What's his evidence for that claim?  The code we have is based directly
on the recommendations in the PPC manuals, and has been tested on
multi-CPU systems.
        regards, tom lane


Re: Erroneous PPC spinlock code

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> The SuSE PPC guru said that the PPC spinlock code we currently use may
> behave erroneously on multiprocessor systems.  Attached is the proposed
> patch, suggested for inclusion in 7.4.  Comments?

I looked into this some more.  The current CVS tip is identical to the
TAS code used in 7.3.* except for being gcc inlined assembler instead of
an out-of-line subroutine.  The 7.3 code is known to work, cf
http://archives.postgresql.org/pgsql-bugs/2002-09/msg00252.php
Given that testing and the lack of any bug reports against 7.3.*,
I think the burden of proof is on the person who thinks we should
change it.

AFAICS the proposed change for TAS() simply amounts to reversing the
sense of the test following stwcx., so that the "success" path
corresponds to "branch not taken" rather than "branch taken".
I cannot see anything in the IBM PPC architecture manual
http://www-3.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF778525699600682CC7
to justify thinking that this changes anything.  If there is any
difference in behavior then I'd think that having the isync in the
forward branch path is safer than not.  The TAS example in the manual
(p. 398) looks like
loop: lwarx    ...      ...      stwcx.    ...      bne    loop      isync

which might be read as saying that the isync should be in the "fall
through" path, but I think it is more correctly read as putting the
isync in the "not predicted to be taken" path.  Branch backward will
be statically predicted to be taken, branch forward not.  In any case
there's no mention here of needing to code the branch in one particular
way.

The proposed change from sync to lwsync during S_UNLOCK is interesting,
but we have to keep in mind that that is *not* a bug fix but an attempt
at performance improvement --- lwsync is a weaker constraint than sync.
I am not convinced that this change is safe for our usage, and I think
it would be folly to stick it into 7.4 at the RC stage of the cycle.

In short, my vote is "show me why" for the TAS change, and "no way for
7.4, but we can look at it later" for the S_UNLOCK change.
        regards, tom lane


Re: Erroneous PPC spinlock code

From
Marcus Meissner
Date:
On Thu, Nov 06, 2003 at 12:08:56AM +0100, Reinhard Max wrote:
> On Wed, 5 Nov 2003 at 13:28, Tom Lane wrote:
>
> > Peter Eisentraut <peter_e@gmx.net> writes:
> >
> > > The SuSE PPC guru said that the PPC spinlock code we currently use
> > > may behave erroneously on multiprocessor systems.
> >
> > What's his evidence for that claim?
>
> Let's ask himself.
>
> > The code we have is based directly on the recommendations in the PPC
> > manuals, and has been tested on multi-CPU systems.
>
> Marcus, can you explain the details, please?

I reviewed the documentation again (at:http://www-1.ibm.com/servers/esdd/articles/powerpc.html
) and it seems to agree with your opinion.

I retract my comment, leave your code as-is.

Ciao, Marcus

Re: Erroneous PPC spinlock code

From
Reinhard Max
Date:
On Wed, 5 Nov 2003 at 13:28, Tom Lane wrote:

> Peter Eisentraut <peter_e@gmx.net> writes:
>
> > The SuSE PPC guru said that the PPC spinlock code we currently use
> > may behave erroneously on multiprocessor systems.
>
> What's his evidence for that claim?

Let's ask himself.

> The code we have is based directly on the recommendations in the PPC
> manuals, and has been tested on multi-CPU systems.

Marcus, can you explain the details, please?


cuReinhard



Re: Erroneous PPC spinlock code

From
Bruce Momjian
Date:
Marcus Meissner wrote:
-- Start of PGP signed section.
> On Thu, Nov 06, 2003 at 12:08:56AM +0100, Reinhard Max wrote:
> > On Wed, 5 Nov 2003 at 13:28, Tom Lane wrote:
> > 
> > > Peter Eisentraut <peter_e@gmx.net> writes:
> > >
> > > > The SuSE PPC guru said that the PPC spinlock code we currently use
> > > > may behave erroneously on multiprocessor systems.
> > >
> > > What's his evidence for that claim?
> > 
> > Let's ask himself.
> > 
> > > The code we have is based directly on the recommendations in the PPC
> > > manuals, and has been tested on multi-CPU systems.
> > 
> > Marcus, can you explain the details, please?
> 
> I reviewed the documentation again (at:
>     http://www-1.ibm.com/servers/esdd/articles/powerpc.html
> ) and it seems to agree with your opinion.
> 
> I retract my comment, leave your code as-is.

Cool.  Thanks for checking.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073