Thread: Spinlock assembly cleanup

Spinlock assembly cleanup

From
Tom Lane
Date:
Pursuant to the gripes raised by Martin Pitt ---

I've consulted some gcc experts within Red Hat and come to the following
conclusions:

* We should consistently refer to the spinlock contents via a
read/write operand declared like "+m"(*lock).  This is consistent
with longstanding practice in the Linux kernel and therefore is unlikely
to get broken in future gcc releases.  The existing ports that use
matched input and output parameters will break, or at least draw nasty
warnings, in upcoming gcc releases.

* Not all of the ports currently declare *lock as an input operand,
but this seems rather dangerous to me; I think all should use "+m".

* Some but not all the ports list "memory" as a clobbered operand.
The gcc manual saith

: If your assembler instruction modifies memory in an unpredictable
: fashion, add `memory' to the list of clobbered registers.  This will
: cause GNU CC to not keep memory values cached in registers across the
: assembler instruction.

Now as far as I can see, none of the spinlock sequences directly clobber
any memory other than the spinlock itself, and so (as long as the lock
is stated to be an output operand) one might think the memory clobber
marking to be excessive.  However, I am thinking it is actually a good
idea and we ought to add the marking to all ports, not remove it.  The
thought is that what we are actually using the spinlock for is to guard
access to values in shared memory, and therefore the act of waiting for
a spinlock can be seen as waiting for other memory variables to assume
values they didn't necessarily have last time we looked.  If gcc caches
shared variables in registers across a spinlock acquisition, the code is
broken.

The alternative to doing this would be to always use volatile pointers
to access shared memory, but I don't want to do that --- in the first
place it's notationally cumbersome, and in the second place it would
hurt performance unnecessarily.  Within straight-line code that holds a
spinlock there is no reason to treat shared memory as volatile.  It's
only when crossing a spinlock boundary that you must reload from memory,
and that seems to be exactly what the "memory" modifier declares for us.

(I am assuming here that marking the asm fragment "volatile" does not
necessarily do what the "memory" modifier does; I can't see anything in
the gcc docs that claims "volatile" includes the effects of "memory".)

So I'd like to make all the gcc-asm fragments for spinlocks follow these
rules.  Comments?
        regards, tom lane


Re: Spinlock assembly cleanup

From
Bruce Momjian
Date:
Sounds good to me.  Consistencyis important because it lets us fix
problems across all cpu types.  I am not 100% excited about the memory
part because it invalidates all register memory values, not just the
shared memory location.  We are specifically accessing a memory address
as part of the ASM, so I don't see how it could pull that memory value
from a register behind our back.

---------------------------------------------------------------------------

Tom Lane wrote:
> Pursuant to the gripes raised by Martin Pitt ---
> 
> I've consulted some gcc experts within Red Hat and come to the following
> conclusions:
> 
> * We should consistently refer to the spinlock contents via a
> read/write operand declared like "+m"(*lock).  This is consistent
> with longstanding practice in the Linux kernel and therefore is unlikely
> to get broken in future gcc releases.  The existing ports that use
> matched input and output parameters will break, or at least draw nasty
> warnings, in upcoming gcc releases.
> 
> * Not all of the ports currently declare *lock as an input operand,
> but this seems rather dangerous to me; I think all should use "+m".
> 
> * Some but not all the ports list "memory" as a clobbered operand.
> The gcc manual saith
> 
> : If your assembler instruction modifies memory in an unpredictable
> : fashion, add `memory' to the list of clobbered registers.  This will
> : cause GNU CC to not keep memory values cached in registers across the
> : assembler instruction.
> 
> Now as far as I can see, none of the spinlock sequences directly clobber
> any memory other than the spinlock itself, and so (as long as the lock
> is stated to be an output operand) one might think the memory clobber
> marking to be excessive.  However, I am thinking it is actually a good
> idea and we ought to add the marking to all ports, not remove it.  The
> thought is that what we are actually using the spinlock for is to guard
> access to values in shared memory, and therefore the act of waiting for
> a spinlock can be seen as waiting for other memory variables to assume
> values they didn't necessarily have last time we looked.  If gcc caches
> shared variables in registers across a spinlock acquisition, the code is
> broken.
> 
> The alternative to doing this would be to always use volatile pointers
> to access shared memory, but I don't want to do that --- in the first
> place it's notationally cumbersome, and in the second place it would
> hurt performance unnecessarily.  Within straight-line code that holds a
> spinlock there is no reason to treat shared memory as volatile.  It's
> only when crossing a spinlock boundary that you must reload from memory,
> and that seems to be exactly what the "memory" modifier declares for us.
> 
> (I am assuming here that marking the asm fragment "volatile" does not
> necessarily do what the "memory" modifier does; I can't see anything in
> the gcc docs that claims "volatile" includes the effects of "memory".)
> 
> So I'd like to make all the gcc-asm fragments for spinlocks follow these
> rules.  Comments?
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
>                http://www.postgresql.org/docs/faqs/FAQ.html
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Spinlock assembly cleanup

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I am not 100% excited about the memory
> part because it invalidates all register memory values, not just the
> shared memory location.

That's exactly the point.

> We are specifically accessing a memory address
> as part of the ASM, so I don't see how it could pull that memory value
> from a register behind our back.

We need to prevent gcc from caching values of *other* memory locations
behind our backs.  Consider code on the order of
spinlockacquire(lock);while (sharedvar == 0){    spinlockrelease(lock);    // we expect someone else to acquire lock
and   // set sharedvar here...    spinlockacquire(lock);}
 

If this is all inline code, and we didn't declare sharedvar as volatile,
then the compiler would be within its rights to assume that sharedvar
doesn't change, hence load it into a register once and not reload it
from shared memory after reacquiring the spinlock.  This will of course
fail to do what we want it to.

We haven't seen failures of this kind because our direct use of
spinlocks is pretty constricted, and (for example) LWLockAcquire is
careful to use a volatile pointer for all accesses to the LWLock
fields.  However this is inefficient: while it owns the spinlock,
LWLockAcquire doesn't really need to treat all the other fields
as volatile, so there are probably a few wasted loads in there.
And the requirement for using volatile pointers is something that is
likely to bite us if we start using spinlocks directly in more parts
of the code.  Not to mention that if anyone cranks up the optimization
level to the point where LWLockAcquire can get inlined into other
functions, those functions will break immediately, because they are
not saying "volatile" for every shared memory access.

So I think it's best to fix it as part of the TAS asm definition.

As things stand at the moment, there's not going to be any efficiency
loss, because LWLockAcquire brute-forces the same result with a volatile
pointer, and its callers aren't going to expect to be able to cache
global variables across a function call anyway.  In the long run when
you consider global inlining of functions, the possibility is there for
the efficiency to be better not worse if we do things this way.
        regards, tom lane