Re: better atomics - v0.5 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: better atomics - v0.5
Date
Msg-id 20140713191738.GI1136@alap3.anarazel.de
Whole thread Raw
In response to Re: better atomics - v0.5  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: better atomics - v0.5  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hi Amit,

On 2014-07-08 11:51:14 +0530, Amit Kapila wrote:
> On Thu, Jul 3, 2014 at 10:56 AM, Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > On Tue, Jul 1, 2014 at 4:10 PM, Andres Freund <andres@2ndquadrant.com>
> wrote:
> 
> Further review of patch:
> 1.
> /*
>  * pg_atomic_test_and_set_flag - TAS()
>  *
>  * Acquire/read barrier semantics.
>  */
> STATIC_IF_INLINE_DECLARE bool
> pg_atomic_test_set_flag(volatile pg_atomic_flag *ptr);
> 
> a. I think Acquire and read barrier semantics aren't equal.

Right. It was mean as Acquire (thus including read barrier) semantics. I
guess I'll better update README.barrier to have definitions of all barriers.

> b. I think it adheres to Acquire semantics for platforms where
> that semantics
> are supported else it defaults to full memory ordering.
> Example :
> #define TAS(lock) (InterlockedCompareExchange(lock, 1, 0))
> 
> Is it right to declare that generic version of function adheres to
> Acquire semantics.

Implementing stronger semantics than required should always be
fine. There's simply no sane way to work with the variety of platforms
we support in any other way.

> 
> 2.
> bool
> pg_atomic_test_set_flag_impl(volatile pg_atomic_flag *ptr)
> {
> return TAS((slock_t *) &ptr->sema);
> #define TAS(lock) (InterlockedCompareExchange(lock, 1, 0))

Where's that from? I can't recall adding a line of code like that?

> a. In above usage (ptr->sema), sema itself is not declared as volatile,
> so would it be right to use it in this way for API
> (InterlockedCompareExchange)
> expecting volatile.

Upgrading a pointer to volatile is defined, so I don't see the problem?

> 3.
> /*
>  * pg_atomic_unlocked_test_flag - TAS()
>  *
>  * No barrier semantics.
>  */
> STATIC_IF_INLINE_DECLARE bool
> pg_atomic_unlocked_test_flag(volatile pg_atomic_flag *ptr);
> 
> a. There is no setting in this, so using TAS in function comments
> seems to be wrong.

Good point.

> b. BTW, I am not able see this function in C11 specs.

So? It's needed for a sensible implementation of spinlocks ontop of
atomics.

> 4.
> #if !defined(PG_HAS_ATOMIC_TEST_SET_FLAG) &&
> defined(PG_HAVE_ATOMIC_EXCHANGE_U32)
> ..
> #define PG_HAS_ATOMIC_UNLOCKED_TEST_FLAG
> static inline bool
> pg_atomic_unlocked_test_flag_impl(volatile pg_atomic_flag *ptr)
> {
> return pg_atomic_read_u32_impl(ptr) == 1;
> }
> 
> 
> #elif !defined(PG_HAS_ATOMIC_TEST_SET_FLAG) &&
> defined(PG_HAVE_ATOMIC_COMPARE_EXCHANGE_U32)
> ..
> #define PG_HAS_ATOMIC_UNLOCKED_TEST_FLAG
> static inline bool
> pg_atomic_unlocked_test_flag_impl(volatile pg_atomic_flag *ptr)
> {
> return (bool) pg_atomic_read_u32_impl(ptr);
> }
> 
> Is there any reason to keep these two implementations different?

No, will change.

> 5.
> atomics-generic-gcc.h
> #ifndef PG_HAS_ATOMIC_UNLOCKED_TEST_FLAG
> #define PG_HAS_ATOMIC_UNLOCKED_TEST_FLAG
> static inline bool
> pg_atomic_unlocked_test_flag_impl(volatile pg_atomic_flag *ptr)
> {
> return ptr->value == 0;
> }
> #endif
> 
> Don't we need to compare it with 1 instead of 0?

Why? It returns true if the lock is free, makes sense imo? Will add
comments to atomics.h

> 6.
> atomics-fallback.h
> pg_atomic_unlocked_test_flag_impl(volatile pg_atomic_flag *ptr)
> {
> /*
>  * Can't do this in the semaphore based implementation - always return
>  * true. Do this in the header so compilers can optimize the test away.
>  */
> return true;
> }
> 
> Why we can't implement this in semaphore based implementation?

Because semaphores don't provide the API for it?

> 7.
> /*
>  * pg_atomic_clear_flag - release lock set by TAS()
>  *
>  * Release/write barrier semantics.
>  */
> STATIC_IF_INLINE_DECLARE void
> pg_atomic_clear_flag(volatile pg_atomic_flag *ptr);
> 
> a. Are Release and write barriers equivalent?

Same answer as above. Meant as "Release (thus including write barrier) semantics"

> b. IIUC then current code doesn't have release semantics for unlock/clear.

If you're referring to s_lock.h with 'current code', yes it should have:*    On platforms with weak memory ordering,
theTAS(), TAS_SPIN(), and*    S_UNLOCK() macros must further include hardware-level memory fence*    instructions to
preventsimilar re-ordering at the hardware level.*    TAS() and TAS_SPIN() must guarantee that loads and stores issued
after*   the macro are not executed until the lock has been obtained.  Conversely,*    S_UNLOCK() must guarantee that
loadsand stores issued before the macro*    have been executed before the lock is released.
 

> We are planing to introduce it in this patch, also this doesn't follow the
> specs which says that clear should have memory_order_seq_cst ordering
> semantics.

Making it guarantee full memory barrier semantics is a major performance
loss on x86-64, so I don't want to do that.

Also there is atomic_flag_test_and_set_explicit()...

> As per its current usage in patch (S_UNLOCK), it seems reasonable
> to have *release* semantics for this API, however I think if we use it for
> generic purpose then it might not be right to define it with Release
> semantics.

I can't really see a user where it's not what you want. And if there is
- well, it can make the semantics stronger if it needs that.

> 8.
> #define PG_HAS_ATOMIC_CLEAR_FLAG
> static inline void
> pg_atomic_clear_flag_impl(volatile pg_atomic_flag *ptr)
> {
> /* XXX: release semantics suffice? */
> pg_memory_barrier_impl();
> pg_atomic_write_u32_impl(ptr, 0);
> }
> 
> Are we sure that having memory barrier before clearing flag will
> achieve *Release* semantics; as per my understanding the definition
> of Release sematics is as below and above doesn't seem to follow the
> same.

Yes, a memory barrier guarantees a release semantics. It guarantees
more, but that's not a problem.

> 9.
> static inline uint32
> pg_atomic_exchange_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 xchg_)
> {
> uint32 old;
> while (true)
> {
> old = pg_atomic_read_u32_impl(ptr);
> if (pg_atomic_compare_exchange_u32_impl(ptr, &old, xchg_))
> break;
> }
> return old;
> }
> 
> What is the reason to implement exchange as compare-and-exchange, at least
> for
> Windows I know corresponding function (InterlockedExchange) exists.
> I could not see any other implementation of same.
> I think this at least deserves comment explaining why we have done
> the implementation this way.

Because Robert and Tom insist that we shouldn't add more operations
employing hardware features. I think that's ok for now, we can always
add more capabilities later.

> 10.
> STATIC_IF_INLINE_DECLARE uint32
> pg_atomic_fetch_add_u32(volatile pg_atomic_uint32 *ptr, int32 add_);
> STATIC_IF_INLINE_DECLARE uint32
> pg_atomic_fetch_sub_u32(volatile pg_atomic_uint32 *ptr, int32 sub_);
> 
> The function name and input value seems to be inconsistent.
> The function name contains *_u32*, however the input value is *int32*.

A u32 is implemented by adding/subtracting a signed int. There's some
platforms why that's the only API provided by intrinsics and it's good
enough for pretty much everything.

> 11.
> Both pg_atomic_fetch_and_u32_impl() and pg_atomic_fetch_or_u32_impl() are
> implemented using compare_exchange routines, don't we have any native API's
> which we can use for implementation.

Same reason as for 9).

> 12.
> I am not able to see API's pg_atomic_add_fetch_u32(),
> pg_atomic_sub_fetch_u32()
> in C11 atomic ops specs, do you need it for something specific?

Yes, it's already used in the lwlocks code and it's provided by several
other atomics libraries. I don't really see a reason not to provide it,
especially as some platforms could implement it more efficiently.
> 14.
> * pg_memory_barrier - prevent the CPU from reordering memory access
>  *
>  * A memory barrier must act as a compiler barrier,
> 
> Is it ensured in all cases that pg_memory_barrier implies compiler barrier
> as well?

Yes.

> Example for msvc case, the specs doesn't seem to guarntee it.

I think it actually indirectly guarantees it, but I'm on a plane and
can't check :)

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: bad estimation together with large work_mem generates terrible slow hash joins
Next
From: Andres Freund
Date:
Subject: Re: better atomics - v0.5