Re: Inefficient barriers on solaris with sun cc - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Inefficient barriers on solaris with sun cc
Date
Msg-id 20141002151839.GB25554@awork2.anarazel.de
Whole thread Raw
In response to Re: Inefficient barriers on solaris with sun cc  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Inefficient barriers on solaris with sun cc
List pgsql-hackers
On 2014-10-02 10:55:06 -0400, Robert Haas wrote:
> On Thu, Oct 2, 2014 at 10:34 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > It's actually more complex than that :(
> >
> > Simple things first:
> >
> > Oracle's definition seems pretty iron clad:
> > http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html
> > __machine_acq_barrier is a clear superset of __machine_r_barrier and
> > __machine_rel_barrier is a clear superset of __machine_w_barrier
> >
> > And that's what we're essentially discussing, no? That said, there seems
> > to be no reason to avoid using __machine_r/w_barrier().
> 
> So let's use those, then.

Right, I've never contended that.

> > But for the reason why I defined pg_read_barrier/write_barrier to
> > __atomic_thread_fence(__ATOMIC_ACQUIRE/RELEASE):
> >
> > The C11/C++11 definition it's made for is hellishly hard to
> > understand. There's very subtle differences between acquire/release
> > operation and acquire/release fences. 29.8.2/7.17.4 seems to be the relevant
> > parts of the standards. I think it essentially guarantees the mapping
> > we're talking about, but it's not entirely clear.
> >
> > The way acquire/release fences are defined is that they form a
> > 'synchronizes-with' relationship with each other. Which would, I think,
> > be sufficient given that without a release like operation on the other
> > thread a read/wrie barrier isn't worth much. But there's a rub in that
> > it requires a atomic operation involved somehere to give that guarantee.
> >
> > I *did* check that the emitted code on relevant architectures is sane,
> > but that doesn't guarantee anything for the future.
> >
> > Therefore I'm proposing to replace it with __ATOMIC_ACQ_REL which is
> > definitely guaranteeing what we need, even if superflously heavy on some
> > platforms. It still is significantly more efficient than
> > __sync_synchronize() which is what was used before. I.e. it generates no
> > code on x86 (MFENCE otherwise), and only a lwsync on PPC (hwsync
> > otherwise, although I don't know why) and similar on ia64.
> 
> A fully barrier on x86 should be an mfence, right?

Right. I've not talked about changing full barrier semantics. What I was
referring to is that until the atomics patch we always redefine
read/write barriers to be full barriers when using gcc intrinsics.

> With only a compiler barrier, you have loads ordered with respect to
> loads and stores ordered with respect to stores, but the load/store
> ordering isn't fully defined.

Yes.

> > Which is why these acquire/release fences, in contrast to
> > acquire/release operations, have more guarantees... You put your finger
> > right onto the spot.
> 
> But, uh, we still don't seem to know what those guarantees actually ARE.

Paired together they form a synchronized-with relationship. Problem #1
is that the standard's language isn't, to me at least, clear if there's
not some case where that's not the case. Problem #2 is that our current
README.barrier definition doesn't actually require barriers to be
paired. Which imo is bad, but still a fact.

The definition of ACQ_REL is pretty clearly sufficient imo: "Full
barrier in both directions and synchronizes with acquire loads and
release stores in another thread.".

> >> Say I want to appear to only change things while flag is 1, so I
> >> write this code:
> >>
> >> flag = 1
> >> acquire barrier
> >> things++
> >> release barrier
> >> flag = 0
> >>
> >> With the definition you (and Oracle) propose
> >> this won't work, because
> >> there's nothing to keep the modification of things from being
> >> reordered before flag = 1.  What good is that?  Apparently, I don't
> >> have any idea!
> >
> > As written above, I don't think that applies to oracle's definition?
> 
> Oracle's definition doesn't look sufficient there.

Perhaps I'm just not understanding what you want to show with this
example. This started as a discussion of comparing acquire/release with
read/write barriers, right? Or are you generally wondering about the
point acquire/release barriers?

> The acquire
> barrier guarantees that the load operations before the barrier will be
> completed before the load and store operations after the barrier, but
> the only operation before the barrier is a store, not a load, so it
> guarantees nothing.

Well, 'acquire' operations always have to related to a load. That's why
standalone 'acquire fences' or 'acquire barriers' are more heavyweight
than just a acquiring read.

And realistically, in the above example, you'd have to read flag to see
that it's not already 1, right?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: David G Johnston
Date:
Subject: Re: Log notice that checkpoint is to be written on shutdown
Next
From: Andres Freund
Date:
Subject: Re: Scaling shared buffer eviction