Re: Inefficient barriers on solaris with sun cc - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Inefficient barriers on solaris with sun cc
Date
Msg-id 20141002180603.GL7158@awork2.anarazel.de
Whole thread Raw
In response to Re: Inefficient barriers on solaris with sun cc  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Inefficient barriers on solaris with sun cc
List pgsql-hackers
On 2014-10-02 11:35:32 -0400, Robert Haas wrote:
> On Thu, Oct 2, 2014 at 11:18 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> >> > Which is why these acquire/release fences, in contrast to
> >> > acquire/release operations, have more guarantees... You put your finger
> >> > right onto the spot.
> >>
> >> But, uh, we still don't seem to know what those guarantees actually ARE.
> >
> > Paired together they form a synchronized-with relationship. Problem #1
> > is that the standard's language isn't, to me at least, clear if there's
> > not some case where that's not the case. Problem #2 is that our current
> > README.barrier definition doesn't actually require barriers to be
> > paired. Which imo is bad, but still a fact.
> 
> I don't know what a "synchronized-with relationship" means.

I'm using the standard's language here, given that I'm trying to reason
about its behaviour...

What it means is that if you have a matching pair of acquire/release
operations or barriers/fences everything that happened *before* the last
release fence will be visible *after* executing the next acquire
operation in a different thread-of-execution. And 'after' is defined in
the way that is true if the 'acquiring' thread can see the result of the
'releasing' operation.
I.e. no loads after the acquire can see values from before the release.

My problem with the definition in the standard is that it's not
particularly clear how acquire fences *without* a underlying explicit
atomic operation are defined in the standard.

I checked gcc's current code and it's fine in that regard. Also other
popular concurrent open source stuff like
http://git.qemu.org/?p=qemu.git;a=blob;f=include/qemu/atomic.h;hb=HEAD
does precisely what I'm talking about:

100 #ifndef smp_wmb
101 #ifdef __ATOMIC_RELEASE
102 #define smp_wmb()   __atomic_thread_fence(__ATOMIC_RELEASE)
103 #else
104 #define smp_wmb()   __sync_synchronize()
105 #endif
106 #endif
107
108 #ifndef smp_rmb
109 #ifdef __ATOMIC_ACQUIRE
110 #define smp_rmb()   __atomic_thread_fence(__ATOMIC_ACQUIRE)
111 #else
112 #define smp_rmb()   __sync_synchronize()
113 #endif
114 #endif

The commit that added it
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=5444e768ee1abe6e021bece19a9a932351f88c88
was written by one gcc guy and reviewed by another one...

So I think we can be pretty sure that gcc's __atomic_thread_fence()
behaves like we want. We probably have to be a bit more careful about
extending that definition (by including atomic.h and doing
atomic_thread_fence(memory_order_acquire)) to use general C11. Which is
probably a couple years away anyway.

> Also, I pretty much designed those definitions to match what Linux
> does.  And it doesn't require that either, though it says that in most
> cases it will work out that way.

My point is that that read barriers aren't particularly meaningful
without a defined store order from another thread/process. Without any
form of pairing you don't have that. The writing side could just have
reordered the writes in a way you didn't want them.  And the kernel docs
do say "A lack of appropriate pairing is almost certainly an error". But
since read barriers also pair with lock releases operations, that's
normally not a big problem.

> > The definition of ACQ_REL is pretty clearly sufficient imo: "Full
> > barrier in both directions and synchronizes with acquire loads and
> > release stores in another thread.".
> 
> I dunno.  What's an acquire load?  What's a release store?  I know
> what loads and stores are; I don't know what the adjectives mean.

An acquire load is either an explicit atomic load (tas, cmpxchg, etc
also count) or a normal load combined with a acquire barrier. The symmetric
definition is true for release store.

(so, on x86 every load/store that prevents compiler reordering
essentially a acquire/release store)

> > And realistically, in the above example, you'd have to read flag to see
> > that it's not already 1, right?
> 
> Not necessarily.  You could be the only writer.  Think about the way
> the backend entries in the stats system work.  The point of setting
> the flag may be for other people to know whether the data is in the
> middle of being modified.

So you're thinking about something seqlock alike... Isn't the problem
then that you actually don't want acquire semantics, but release or
write barrier semantics on that store? The acquire/read barrier part
would be on the reader side, no?
I'm still unsure what you want to show with that example?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: DDL Damage Assessment
Next
From: "Joshua D. Drake"
Date:
Subject: Re: DDL Damage Assessment