Re: better atomics - v0.5 - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: better atomics - v0.5
Date
Msg-id CAA4eK1Kr9kCEA2VmMqN3k+agsw1OP71bFesZS7VVSHp43JjPrw@mail.gmail.com
Whole thread Raw
In response to Re: better atomics - v0.5  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Mon, Jul 14, 2014 at 12:47 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-07-08 11:51:14 +0530, Amit Kapila wrote:
> > On Thu, Jul 3, 2014 at 10:56 AM, Amit Kapila <amit.kapila16@gmail.com>
> > wrote:

> > Further review of patch:
> > 1.
> > /*
> >  * pg_atomic_test_and_set_flag - TAS()
> >  *
> >  * Acquire/read barrier semantics.
> >  */
> > STATIC_IF_INLINE_DECLARE bool
> > pg_atomic_test_set_flag(volatile pg_atomic_flag *ptr);
> >
> > a. I think Acquire and read barrier semantics aren't equal.
>
> Right. It was mean as Acquire (thus including read barrier) semantics.

Okay, I got confused by reading comments, may be saying the same
explicitly in comments is better.

> I guess I'll better update README.barrier to have definitions of all
> barriers.

I think updating about the reasons behind using specific
barriers for atomic operations defined by PG can be useful
for people who want to understand or use the atomic op API's.

> > b. I think it adheres to Acquire semantics for platforms where
> > that semantics
> > are supported else it defaults to full memory ordering.
> > Example :
> > #define TAS(lock) (InterlockedCompareExchange(lock, 1, 0))
> >
> > Is it right to declare that generic version of function adheres to
> > Acquire semantics.
>
> Implementing stronger semantics than required should always be
> fine. There's simply no sane way to work with the variety of platforms
> we support in any other way.

Agreed, may be we can have a note somewhere either in code or
readme to mention about it, if you think that can be helpful.

> >
> > 2.
> > bool
> > pg_atomic_test_set_flag_impl(volatile pg_atomic_flag *ptr)
> > {
> > return TAS((slock_t *) &ptr->sema);
> > #define TAS(lock) (InterlockedCompareExchange(lock, 1, 0))
>
> Where's that from?

Its there in s_lock.h. Refer below code:
#ifdef WIN32_ONLY_COMPILER
typedef LONG slock_t;

#define HAS_TEST_AND_SET
#define TAS(lock) (InterlockedCompareExchange(lock, 1, 0))

> I can't recall adding a line of code like that?

It is not added by your patch but used in your patch.

I am not sure if that is what excatly you want atomic API to define
for WIN32, but I think it is picking this definition.  I have one question
here, if the value of sema is other than 1 or 0, then it won't set the flag
and not sure what it will return (ex. if the value is negative), so is this
implementation completely sane?


> > a. In above usage (ptr->sema), sema itself is not declared as volatile,
> > so would it be right to use it in this way for API
> > (InterlockedCompareExchange)
> > expecting volatile.
>
> Upgrading a pointer to volatile is defined, so I don't see the problem?

Thats okay.  However all other structure definitions use it as
volatile, example for atomics-arch-amd64.h it is defined as follows:
#define PG_HAVE_ATOMIC_FLAG_SUPPORT
typedef struct pg_atomic_flag
{
volatile char value;
} pg_atomic_flag;

So is there any harm if we keep this definition in sync with others?

> > 3.
> > /*
> >  * pg_atomic_unlocked_test_flag - TAS()
> >  *
> >  * No barrier semantics.
> >  */
> > STATIC_IF_INLINE_DECLARE bool
> > pg_atomic_unlocked_test_flag(volatile pg_atomic_flag *ptr);
> >
> > a. There is no setting in this, so using TAS in function comments
> > seems to be wrong.
>
> Good point.
>
> > b. BTW, I am not able see this function in C11 specs.
>
> So? It's needed for a sensible implementation of spinlocks ontop of
> atomics.

Okay got the point, do you think it makes sense to have a common
agreement/consensus w.r.t which API's are necessary as a first
cut.  For me as a reviewer, if the API is implemented as per specs or
what it is desired to then we can have it, however I am not sure if there
is general consensus or at least some people agreed to set of API's
which are required for first cut, have I missed some conclusion/discussion
about it.
>
> > 5.
> > atomics-generic-gcc.h
> > #ifndef PG_HAS_ATOMIC_UNLOCKED_TEST_FLAG
> > #define PG_HAS_ATOMIC_UNLOCKED_TEST_FLAG
> > static inline bool
> > pg_atomic_unlocked_test_flag_impl(volatile pg_atomic_flag *ptr)
> > {
> > return ptr->value == 0;
> > }
> > #endif
> >
> > Don't we need to compare it with 1 instead of 0?
>
> Why? It returns true if the lock is free, makes sense imo?

Other implementation in atomics-generic.h seems to implement it other
way 
#define PG_HAS_ATOMIC_UNLOCKED_TEST_FLAG
static inline bool
pg_atomic_unlocked_test_flag_impl(volatile pg_atomic_flag *ptr)
{
return pg_atomic_read_u32_impl(ptr) == 1;
}

> Will add comments to atomics.h

I think that will be helpful.

> > 6.
> > atomics-fallback.h
> > pg_atomic_unlocked_test_flag_impl(volatile pg_atomic_flag *ptr)
> > {
> > /*
> >  * Can't do this in the semaphore based implementation - always return
> >  * true. Do this in the header so compilers can optimize the test away.
> >  */
> > return true;
> > }
> >
> > Why we can't implement this in semaphore based implementation?
>
> Because semaphores don't provide the API for it?

Okay, but it is not clear from comments why this test should always
return true, basically I am not able to think why incase of missing API
returning true is correct behaviour for this API?

> > 7.
> > /*
> >  * pg_atomic_clear_flag - release lock set by TAS()
> >  *
> >  * Release/write barrier semantics.
> >  */
> > STATIC_IF_INLINE_DECLARE void
> > pg_atomic_clear_flag(volatile pg_atomic_flag *ptr);
> >
> > a. Are Release and write barriers equivalent?
>
> Same answer as above. Meant as "Release (thus including write barrier) semantics"
>
> > b. IIUC then current code doesn't have release semantics for unlock/clear.
>
> If you're referring to s_lock.h with 'current code', yes it should have:

I was referring to next point (point number 8) which you have clarified below.

>
> > 8.
> > #define PG_HAS_ATOMIC_CLEAR_FLAG
> > static inline void
> > pg_atomic_clear_flag_impl(volatile pg_atomic_flag *ptr)
> > {
> > /* XXX: release semantics suffice? */
> > pg_memory_barrier_impl();
> > pg_atomic_write_u32_impl(ptr, 0);
> > }
> >
> > Are we sure that having memory barrier before clearing flag will
> > achieve *Release* semantics; as per my understanding the definition
> > of Release sematics is as below and above doesn't seem to follow the
> > same.
>
> Yes, a memory barrier guarantees a release semantics. It guarantees
> more, but that's not a problem.

You are right.

> > 9.
> > static inline uint32
> > pg_atomic_exchange_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 xchg_)
> > {
> > uint32 old;
> > while (true)
> > {
> > old = pg_atomic_read_u32_impl(ptr);
> > if (pg_atomic_compare_exchange_u32_impl(ptr, &old, xchg_))
> > break;
> > }
> > return old;
> > }
> >
> > What is the reason to implement exchange as compare-and-exchange, at least
> > for
> > Windows I know corresponding function (InterlockedExchange) exists.
> > I could not see any other implementation of same.
> > I think this at least deserves comment explaining why we have done
> > the implementation this way.
>
> Because Robert and Tom insist that we shouldn't add more operations
> employing hardware features. I think that's ok for now, we can always
> add more capabilities later.

No issues, may be a comment indicating some form of reason would be good.

> > 10.
> > STATIC_IF_INLINE_DECLARE uint32
> > pg_atomic_fetch_add_u32(volatile pg_atomic_uint32 *ptr, int32 add_);
> > STATIC_IF_INLINE_DECLARE uint32
> > pg_atomic_fetch_sub_u32(volatile pg_atomic_uint32 *ptr, int32 sub_);
> >
> > The function name and input value seems to be inconsistent.
> > The function name contains *_u32*, however the input value is *int32*.
>
> A u32 is implemented by adding/subtracting a signed int. There's some
> platforms why that's the only API provided by intrinsics and it's good
> enough for pretty much everything.

As such no issues, but just wondering is there any reason to prefer
*_u32 instead of simple _32 or *32.

>
> > 12.
> > I am not able to see API's pg_atomic_add_fetch_u32(),
> > pg_atomic_sub_fetch_u32()
> > in C11 atomic ops specs, do you need it for something specific?
>
> Yes, it's already used in the lwlocks code and it's provided by several
> other atomics libraries. I don't really see a reason not to provide it,
> especially as some platforms could implement it more efficiently.

Okay, I find below implementation in your patch. Won't it add the value
twice?

+#if !defined(PG_HAS_ATOMIC_ADD_FETCH_U32) && defined(PG_HAS_ATOMIC_FETCH_ADD_U32)
+#define PG_HAS_ATOMIC_ADD_FETCH_U32
+static inline uint32
+pg_atomic_add_fetch_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_)
+{
+ return pg_atomic_fetch_add_u32_impl(ptr, add_) + add_;
+}
+#endif


> > 14.
> > * pg_memory_barrier - prevent the CPU from reordering memory access
> >  *
> >  * A memory barrier must act as a compiler barrier,
> >
> > Is it ensured in all cases that pg_memory_barrier implies compiler barrier
> > as well?
>
> Yes.
>
> > Example for msvc case, the specs doesn't seem to guarntee it.
>
> I think it actually indirectly guarantees it, but I'm on a plane and
> can't check :)

No problem let me know when you are comfortable, this is just for
clarification.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Allowing NOT IN to use ANTI joins
Next
From: Magnus Hagander
Date:
Subject: Re: SSL information view