Re: Spinlocks and compiler/memory barriers - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Spinlocks and compiler/memory barriers
Date
Msg-id 20140630132605.GM26930@awork2.anarazel.de
Whole thread Raw
In response to Re: Spinlocks and compiler/memory barriers  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Spinlocks and compiler/memory barriers  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2014-06-30 08:03:40 -0400, Robert Haas wrote:
> On Sat, Jun 28, 2014 at 9:41 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> >> No, I think it's going to be *much* simpler than that.  How about I
> >> take a crack at this next week and then either (a) I'll see why it's a
> >> bad idea and we can go from there or (b) you can review what I come up
> >> with and tell me why it sucks?
> >
> > Ok. I think that's going in the wrong direction (duplication of
> > nontrivial knowledge), but maybe I'm taking a to 'purist' approach
> > here. Prove me wrong :)
> 
> I'm not sure what you'll think of this, but see attached.

I think it continues in the tradition of making s_lock.h ever harder to
follow. But it's still better than what we have now from a correctness
perspective.

>  I think
> newer releases of Sun Studio support that GCC way of doing a barrier,
> but I don't know how to test for that, so at the moment that uses the
> fallback "put it in a function" approach,

Sun studio's support for the gcc way is new (some update to sun studio 12), so that's
probably not sufficient.
#include <mbarrier.h> and __compiler_barrier()/__machine_rel_barrier()
should do the trick for spinlocks. That seems to have existed for
longer.

Solaris seems to run with TSO enabled for SPARC (whereas linux uses RMO,
relaxed memory ordering), so it's probably fine to just use the compiler
barrier.

> along with HPPA non-GCC and
> Univel CC.  Those things are obscure enough that I don't care about
> making them less performance.

Fine with me.

>  I think AIX is actually OK as-is; if _check_lock() is a compiler
> barrier but _clear_lock() is not, that would be pretty odd.

Agreed.

> >> > How are you suggesting we deal with the generic S_UNLOCK
> >> > case without having a huge ifdef?
> >> > Why do we build an abstraction layer (barrier.h) and then not use it?
> >>
> >> Because (1) the abstraction doesn't fit very well unless we do a lot
> >> of additional work to build acquire and release barriers for every
> >> platform we support and
> >
> > Meh. Something like the (untested):
> > #if !defined(pg_release_barrier) && defined(pg_read_barrier) && defined(pg_write_barrier)
> > #define pg_release_barrier() do { pg_read_barrier(); pg_write_barrier();} while (0)
> > #elif !defined(pg_release_barrier)
> > #define pg_release_barrier() pg_memory_barrier()
> > #endif
> >
> > before the fallback definition of pg_read/write_barrier should suffice?
> 
> That doesn't sound like a good idea.  For example, on PPC, a read
> barrier is lwsync, and so is a write barrier.  That would also suck on
> any architecture where either a read or write barrier gets implemented
> as a full memory barrier, though I guess it might be smart enough to
> optimize away most of the cost of the second of two barriers in
> immediate succession.

Well, that's why I suggested only doing it if we haven't got a
pg_release_barrier() defined. And fallback to memory_barrier() directly
if read/write barriers are implemented using it so we don't have two
memory barriers in a row.

> I think if we want to use barrier.h as the foundation for this, we're
> going to need to define a new set of things in there that have acquire
> and release semantics, or just use full barriers for everything -
> which would be wasteful on, e.g., x86.  And I don't particularly see
> the point in going to a lot of work to invent those new abstractions
> everywhere when we can just tweak s_lock.h in place a bit and be done
> with it.  Making those files interdependent doesn't strike me as a
> win.

We'll need release semantics in more places than just s_lock.h. Anything
that acts like a lock without using spinlocks actually needs
acquire/release semantics. The lwlock patch only gets away with it
because the atomic operations implementation happen to act as acquire or
full memory barriers.

> >> (2) I don't have much confidence that we can
> >> depend on the spinlock fallback for barriers without completely
> >> breaking obscure platforms, and I'd rather make a more minimal set of
> >> changes.
> >
> > Well, it's the beginning of the cycle. And we're already depending on
> > barriers for correctness (and it's not getting less), so I don't really
> > see what avoiding barrier usage buys us except harder to find breakage?
> 
> If we use barriers to implement any facility other than spinlocks, I
> have reasonable confidence that we won't break things more than they
> already are broken today, because I think the fallback to
> acquire-and-release a spinlock probably works, even though it's likely
> terrible for performance.  I have significantly less confidence that
> trying to implement spinlocks in terms of barriers is going to be
> reliable.

So you believe we have a reliable barrier implementation - but you don't
believe it's reliable enough for spinlocks? If we'd *not* use the
barrier fallback for spinlock release if, and only if, it's implemented
via spinlocks, I don't see why we'd be worse off?

> +#if !defined(S_UNLOCK)
> +#if defined(__INTEL_COMPILER)
> +#define S_UNLOCK(lock)    \
> +    do { __memory_barrier(); *(lock) = 0; } while (0)
> +#else

So, you're continuing to rely on the implicit acquire/release semantics
of volatiles on itanium and on the ordering guarantees for x86. Fair
enough. I checked and it seems to even be followed by gcc.

> +#define S_UNLOCK(lock)    \
> +    do { __asm__ __volatile__("" : : : "memory");  *(lock) = 0; } while (0)
> +#endif
> +#endif

This needs a comment that this implementation is only safe on strongly
ordered systems.

Causing a couple of problems:

1)
Afaics ARM will fall back to this for older gccs - and ARM is weakly
ordered. I think I'd just also use swpb to release the lock. Something
like
#define S_INIT_LOCK(lock)    (*(lock)) = 0);

#define S_UNLOCK s_unlock
static __inline__ void
s_unlock(volatile slock_t *lock)
{register slock_t _res = 0;
__asm__ __volatile__(    "    swpb     %0, %0, [%2]    \n"
:        "+r"(_res), "+m"(*lock)
:        "r"(lock)
:        "memory");       Assert(_res == 1); // lock should be held by us
}

2)
Afaics it's also wrong for sparc on linux (and probably the BSDs) where
relaxed memory ordering is used.

3)
Also looks wrong on mips which needs a full memory barrier.

> @@ -826,6 +845,9 @@ spin_delay(void)
>  }
>  #endif
>  
> +#define S_UNLOCK(lock)    \
> +    do { MemoryBarrier(); (*(lock)) = 0); } while (0)
> +
>  #endif

Hm. Won't this end up being a full memory barrier on x86-64 even though
a compiler barrier would suffice on x86? In my measurements on linux
x86-64 that has a prety hefty performance penalty on NUMA systems.

> -#define S_UNLOCK(lock)        (*((volatile slock_t *) (lock)) = 0)
> +/*
> + * On most platforms, S_UNLOCK is essentially *(lock) = 0, but we can't just
> + * put that it into an inline macro, because the compiler might reorder
> + * instructions from the critical section to occur after the lock release.
> + * But since the compiler probably can't know what the external function
> + * s_unlock is doing, putting the same logic there should be adequate.
> + * Wherever possible, it's best not to rely on this default implementation,
> + * both because a sufficiently-smart globally optimizing compiler might be
> + * able to see through this charade, and perhaps more importantly because
> + * adding the cost of a function call to every spinlock release may hurt
> + * performance significantly.
> + */
> +#define USE_DEFAULT_S_UNLOCK
> +extern void s_unlock(volatile s_lock *lock);
> +#define S_UNLOCK(lock)        s_unlock(lock)

I think this should also mention that the fallback only works for
strongly ordered systems.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: RLS Design
Next
From: Stephen Frost
Date:
Subject: Re: pgaudit - an auditing extension for PostgreSQL