Re: better atomics - v0.5 - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: better atomics - v0.5 |
Date | |
Msg-id | CAA4eK1Kr9kCEA2VmMqN3k+agsw1OP71bFesZS7VVSHp43JjPrw@mail.gmail.com Whole thread Raw |
In response to | Re: better atomics - v0.5 (Andres Freund <andres@2ndquadrant.com>) |
List | pgsql-hackers |
On Mon, Jul 14, 2014 at 12:47 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-07-08 11:51:14 +0530, Amit Kapila wrote:
> > On Thu, Jul 3, 2014 at 10:56 AM, Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > Further review of patch:
> > 1.
> > /*
> > * pg_atomic_test_and_set_flag - TAS()
> > *
> > * Acquire/read barrier semantics.
> > */
> > STATIC_IF_INLINE_DECLARE bool
> > pg_atomic_test_set_flag(volatile pg_atomic_flag *ptr);
> >
> > a. I think Acquire and read barrier semantics aren't equal.
>
> Right. It was mean as Acquire (thus including read barrier) semantics.
> > b. I think it adheres to Acquire semantics for platforms where
> > that semantics
> > are supported else it defaults to full memory ordering.
> > Example :
> > #define TAS(lock) (InterlockedCompareExchange(lock, 1, 0))
> >
> > Is it right to declare that generic version of function adheres to
> > Acquire semantics.
>
> Implementing stronger semantics than required should always be
> fine. There's simply no sane way to work with the variety of platforms
> we support in any other way.
> >
> > 2.
> > bool
> > pg_atomic_test_set_flag_impl(volatile pg_atomic_flag *ptr)
> > {
> > return TAS((slock_t *) &ptr->sema);
> > #define TAS(lock) (InterlockedCompareExchange(lock, 1, 0))
>
> Where's that from?
>
> > 8.
> > #define PG_HAS_ATOMIC_CLEAR_FLAG
> > static inline void
> > pg_atomic_clear_flag_impl(volatile pg_atomic_flag *ptr)
> > {
> > /* XXX: release semantics suffice? */
> > pg_memory_barrier_impl();
> > pg_atomic_write_u32_impl(ptr, 0);
> > }
> >
> > Are we sure that having memory barrier before clearing flag will
> > achieve *Release* semantics; as per my understanding the definition
> > of Release sematics is as below and above doesn't seem to follow the
> > same.
>
> Yes, a memory barrier guarantees a release semantics. It guarantees
> more, but that's not a problem.
You are right.
> > 9.
> > static inline uint32
> > pg_atomic_exchange_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 xchg_)
> > {
> > uint32 old;
> > while (true)
> > {
> > old = pg_atomic_read_u32_impl(ptr);
> > if (pg_atomic_compare_exchange_u32_impl(ptr, &old, xchg_))
> > break;
> > }
> > return old;
> > }
> >
> > What is the reason to implement exchange as compare-and-exchange, at least
> > for
> > Windows I know corresponding function (InterlockedExchange) exists.
> > I could not see any other implementation of same.
> > I think this at least deserves comment explaining why we have done
> > the implementation this way.
>
> Because Robert and Tom insist that we shouldn't add more operations
> employing hardware features. I think that's ok for now, we can always
> add more capabilities later.
No issues, may be a comment indicating some form of reason would be good.
> > 10.
> > STATIC_IF_INLINE_DECLARE uint32
> > pg_atomic_fetch_add_u32(volatile pg_atomic_uint32 *ptr, int32 add_);
> > STATIC_IF_INLINE_DECLARE uint32
> > pg_atomic_fetch_sub_u32(volatile pg_atomic_uint32 *ptr, int32 sub_);
> >
> > The function name and input value seems to be inconsistent.
> > The function name contains *_u32*, however the input value is *int32*.
>
> A u32 is implemented by adding/subtracting a signed int. There's some
> platforms why that's the only API provided by intrinsics and it's good
> enough for pretty much everything.
>
> > 12.
> > I am not able to see API's pg_atomic_add_fetch_u32(),
> > pg_atomic_sub_fetch_u32()
> > in C11 atomic ops specs, do you need it for something specific?
>
> Yes, it's already used in the lwlocks code and it's provided by several
> other atomics libraries. I don't really see a reason not to provide it,
> especially as some platforms could implement it more efficiently.
> > 14.
> > * pg_memory_barrier - prevent the CPU from reordering memory access
> > *
> > * A memory barrier must act as a compiler barrier,
> >
> > Is it ensured in all cases that pg_memory_barrier implies compiler barrier
> > as well?
>
> Yes.
>
> > Example for msvc case, the specs doesn't seem to guarntee it.
>
> I think it actually indirectly guarantees it, but I'm on a plane and
> can't check :)
No problem let me know when you are comfortable, this is just for
> On 2014-07-08 11:51:14 +0530, Amit Kapila wrote:
> > On Thu, Jul 3, 2014 at 10:56 AM, Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > Further review of patch:
> > 1.
> > /*
> > * pg_atomic_test_and_set_flag - TAS()
> > *
> > * Acquire/read barrier semantics.
> > */
> > STATIC_IF_INLINE_DECLARE bool
> > pg_atomic_test_set_flag(volatile pg_atomic_flag *ptr);
> >
> > a. I think Acquire and read barrier semantics aren't equal.
>
> Right. It was mean as Acquire (thus including read barrier) semantics.
Okay, I got confused by reading comments, may be saying the same
explicitly in comments is better.
> I guess I'll better update README.barrier to have definitions of all
> barriers.
I think updating about the reasons behind using specific
barriers for atomic operations defined by PG can be useful
for people who want to understand or use the atomic op API's.
> > that semantics
> > are supported else it defaults to full memory ordering.
> > Example :
> > #define TAS(lock) (InterlockedCompareExchange(lock, 1, 0))
> >
> > Is it right to declare that generic version of function adheres to
> > Acquire semantics.
>
> Implementing stronger semantics than required should always be
> fine. There's simply no sane way to work with the variety of platforms
> we support in any other way.
Agreed, may be we can have a note somewhere either in code or
readme to mention about it, if you think that can be helpful.
> >
> > 2.
> > bool
> > pg_atomic_test_set_flag_impl(volatile pg_atomic_flag *ptr)
> > {
> > return TAS((slock_t *) &ptr->sema);
> > #define TAS(lock) (InterlockedCompareExchange(lock, 1, 0))
>
> Where's that from?
Its there in s_lock.h. Refer below code:
#ifdef WIN32_ONLY_COMPILER
typedef LONG slock_t;
#define HAS_TEST_AND_SET
#define TAS(lock) (InterlockedCompareExchange(lock, 1, 0))
> I can't recall adding a line of code like that?
> > a. In above usage (ptr->sema), sema itself is not declared as volatile,
> > so would it be right to use it in this way for API
> > (InterlockedCompareExchange)
> > expecting volatile.
>
> Upgrading a pointer to volatile is defined, so I don't see the problem?
> > 3.
> > /*
> > * pg_atomic_unlocked_test_flag - TAS()
> > *
> > * No barrier semantics.
> > */
> > STATIC_IF_INLINE_DECLARE bool
> > pg_atomic_unlocked_test_flag(volatile pg_atomic_flag *ptr);
> >
> > a. There is no setting in this, so using TAS in function comments
> > seems to be wrong.
>
> Good point.
>
> > b. BTW, I am not able see this function in C11 specs.
>
> So? It's needed for a sensible implementation of spinlocks ontop of
> atomics.
> > 5.
> > atomics-generic-gcc.h
> > #ifndef PG_HAS_ATOMIC_UNLOCKED_TEST_FLAG
> > #define PG_HAS_ATOMIC_UNLOCKED_TEST_FLAG
> > static inline bool
> > pg_atomic_unlocked_test_flag_impl(volatile pg_atomic_flag *ptr)
> > {
> > return ptr->value == 0;
> > }
> > #endif
> >
> > Don't we need to compare it with 1 instead of 0?
>
> Why? It returns true if the lock is free, makes sense imo?
> > 6.
> > atomics-fallback.h
> > pg_atomic_unlocked_test_flag_impl(volatile pg_atomic_flag *ptr)
> > {
> > /*
> > * Can't do this in the semaphore based implementation - always return
> > * true. Do this in the header so compilers can optimize the test away.
> > */
> > return true;
> > }
> >
> > Why we can't implement this in semaphore based implementation?
>
> Because semaphores don't provide the API for it?
> > 7.
> > /*
> > * pg_atomic_clear_flag - release lock set by TAS()
> > *
> > * Release/write barrier semantics.
> > */
> > STATIC_IF_INLINE_DECLARE void
> > pg_atomic_clear_flag(volatile pg_atomic_flag *ptr);
> >
> > a. Are Release and write barriers equivalent?
>
> Same answer as above. Meant as "Release (thus including write barrier) semantics"
>
> > b. IIUC then current code doesn't have release semantics for unlock/clear.
>
> If you're referring to s_lock.h with 'current code', yes it should have:
It is not added by your patch but used in your patch.
I am not sure if that is what excatly you want atomic API to define
for WIN32, but I think it is picking this definition. I have one question
here, if the value of sema is other than 1 or 0, then it won't set the flag
and not sure what it will return (ex. if the value is negative), so is this
implementation completely sane?
> > a. In above usage (ptr->sema), sema itself is not declared as volatile,
> > so would it be right to use it in this way for API
> > (InterlockedCompareExchange)
> > expecting volatile.
>
> Upgrading a pointer to volatile is defined, so I don't see the problem?
Thats okay. However all other structure definitions use it as
volatile, example for atomics-arch-amd64.h it is defined as follows:
#define PG_HAVE_ATOMIC_FLAG_SUPPORT
typedef struct pg_atomic_flag
{
volatile char value;
} pg_atomic_flag;
So is there any harm if we keep this definition in sync with others?
> > 3.
> > /*
> > * pg_atomic_unlocked_test_flag - TAS()
> > *
> > * No barrier semantics.
> > */
> > STATIC_IF_INLINE_DECLARE bool
> > pg_atomic_unlocked_test_flag(volatile pg_atomic_flag *ptr);
> >
> > a. There is no setting in this, so using TAS in function comments
> > seems to be wrong.
>
> Good point.
>
> > b. BTW, I am not able see this function in C11 specs.
>
> So? It's needed for a sensible implementation of spinlocks ontop of
> atomics.
Okay got the point, do you think it makes sense to have a common
agreement/consensus w.r.t which API's are necessary as a first
cut. For me as a reviewer, if the API is implemented as per specs or
what it is desired to then we can have it, however I am not sure if there
is general consensus or at least some people agreed to set of API's
which are required for first cut, have I missed some conclusion/discussion
about it.
>> > 5.
> > atomics-generic-gcc.h
> > #ifndef PG_HAS_ATOMIC_UNLOCKED_TEST_FLAG
> > #define PG_HAS_ATOMIC_UNLOCKED_TEST_FLAG
> > static inline bool
> > pg_atomic_unlocked_test_flag_impl(volatile pg_atomic_flag *ptr)
> > {
> > return ptr->value == 0;
> > }
> > #endif
> >
> > Don't we need to compare it with 1 instead of 0?
>
> Why? It returns true if the lock is free, makes sense imo?
Other implementation in atomics-generic.h seems to implement it other
way
#define PG_HAS_ATOMIC_UNLOCKED_TEST_FLAG
static inline bool
pg_atomic_unlocked_test_flag_impl(volatile pg_atomic_flag *ptr)
{
return pg_atomic_read_u32_impl(ptr) == 1;
}
> Will add comments to atomics.h
I think that will be helpful.
> > 6.
> > atomics-fallback.h
> > pg_atomic_unlocked_test_flag_impl(volatile pg_atomic_flag *ptr)
> > {
> > /*
> > * Can't do this in the semaphore based implementation - always return
> > * true. Do this in the header so compilers can optimize the test away.
> > */
> > return true;
> > }
> >
> > Why we can't implement this in semaphore based implementation?
>
> Because semaphores don't provide the API for it?
Okay, but it is not clear from comments why this test should always
return true, basically I am not able to think why incase of missing API
returning true is correct behaviour for this API?
> > 7.
> > /*
> > * pg_atomic_clear_flag - release lock set by TAS()
> > *
> > * Release/write barrier semantics.
> > */
> > STATIC_IF_INLINE_DECLARE void
> > pg_atomic_clear_flag(volatile pg_atomic_flag *ptr);
> >
> > a. Are Release and write barriers equivalent?
>
> Same answer as above. Meant as "Release (thus including write barrier) semantics"
>
> > b. IIUC then current code doesn't have release semantics for unlock/clear.
>
> If you're referring to s_lock.h with 'current code', yes it should have:
I was referring to next point (point number 8) which you have clarified below.
> > 8.
> > #define PG_HAS_ATOMIC_CLEAR_FLAG
> > static inline void
> > pg_atomic_clear_flag_impl(volatile pg_atomic_flag *ptr)
> > {
> > /* XXX: release semantics suffice? */
> > pg_memory_barrier_impl();
> > pg_atomic_write_u32_impl(ptr, 0);
> > }
> >
> > Are we sure that having memory barrier before clearing flag will
> > achieve *Release* semantics; as per my understanding the definition
> > of Release sematics is as below and above doesn't seem to follow the
> > same.
>
> Yes, a memory barrier guarantees a release semantics. It guarantees
> more, but that's not a problem.
You are right.
> > 9.
> > static inline uint32
> > pg_atomic_exchange_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 xchg_)
> > {
> > uint32 old;
> > while (true)
> > {
> > old = pg_atomic_read_u32_impl(ptr);
> > if (pg_atomic_compare_exchange_u32_impl(ptr, &old, xchg_))
> > break;
> > }
> > return old;
> > }
> >
> > What is the reason to implement exchange as compare-and-exchange, at least
> > for
> > Windows I know corresponding function (InterlockedExchange) exists.
> > I could not see any other implementation of same.
> > I think this at least deserves comment explaining why we have done
> > the implementation this way.
>
> Because Robert and Tom insist that we shouldn't add more operations
> employing hardware features. I think that's ok for now, we can always
> add more capabilities later.
No issues, may be a comment indicating some form of reason would be good.
> > 10.
> > STATIC_IF_INLINE_DECLARE uint32
> > pg_atomic_fetch_add_u32(volatile pg_atomic_uint32 *ptr, int32 add_);
> > STATIC_IF_INLINE_DECLARE uint32
> > pg_atomic_fetch_sub_u32(volatile pg_atomic_uint32 *ptr, int32 sub_);
> >
> > The function name and input value seems to be inconsistent.
> > The function name contains *_u32*, however the input value is *int32*.
>
> A u32 is implemented by adding/subtracting a signed int. There's some
> platforms why that's the only API provided by intrinsics and it's good
> enough for pretty much everything.
As such no issues, but just wondering is there any reason to prefer
*_u32 instead of simple _32 or *32.
>
> > 12.
> > I am not able to see API's pg_atomic_add_fetch_u32(),
> > pg_atomic_sub_fetch_u32()
> > in C11 atomic ops specs, do you need it for something specific?
>
> Yes, it's already used in the lwlocks code and it's provided by several
> other atomics libraries. I don't really see a reason not to provide it,
> especially as some platforms could implement it more efficiently.
Okay, I find below implementation in your patch. Won't it add the value
twice?
+#if !defined(PG_HAS_ATOMIC_ADD_FETCH_U32) && defined(PG_HAS_ATOMIC_FETCH_ADD_U32)
+#define PG_HAS_ATOMIC_ADD_FETCH_U32
+static inline uint32
+pg_atomic_add_fetch_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_)
+{
+ return pg_atomic_fetch_add_u32_impl(ptr, add_) + add_;
+}
+#endif
> > 14.
> > * pg_memory_barrier - prevent the CPU from reordering memory access
> > *
> > * A memory barrier must act as a compiler barrier,
> >
> > Is it ensured in all cases that pg_memory_barrier implies compiler barrier
> > as well?
>
> Yes.
>
> > Example for msvc case, the specs doesn't seem to guarntee it.
>
> I think it actually indirectly guarantees it, but I'm on a plane and
> can't check :)
No problem let me know when you are comfortable, this is just for
pgsql-hackers by date: