Re: Inefficient barriers on solaris with sun cc - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Inefficient barriers on solaris with sun cc |
Date | |
Msg-id | 20141002151839.GB25554@awork2.anarazel.de Whole thread Raw |
In response to | Re: Inefficient barriers on solaris with sun cc (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Inefficient barriers on solaris with sun cc
|
List | pgsql-hackers |
On 2014-10-02 10:55:06 -0400, Robert Haas wrote: > On Thu, Oct 2, 2014 at 10:34 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > It's actually more complex than that :( > > > > Simple things first: > > > > Oracle's definition seems pretty iron clad: > > http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html > > __machine_acq_barrier is a clear superset of __machine_r_barrier and > > __machine_rel_barrier is a clear superset of __machine_w_barrier > > > > And that's what we're essentially discussing, no? That said, there seems > > to be no reason to avoid using __machine_r/w_barrier(). > > So let's use those, then. Right, I've never contended that. > > But for the reason why I defined pg_read_barrier/write_barrier to > > __atomic_thread_fence(__ATOMIC_ACQUIRE/RELEASE): > > > > The C11/C++11 definition it's made for is hellishly hard to > > understand. There's very subtle differences between acquire/release > > operation and acquire/release fences. 29.8.2/7.17.4 seems to be the relevant > > parts of the standards. I think it essentially guarantees the mapping > > we're talking about, but it's not entirely clear. > > > > The way acquire/release fences are defined is that they form a > > 'synchronizes-with' relationship with each other. Which would, I think, > > be sufficient given that without a release like operation on the other > > thread a read/wrie barrier isn't worth much. But there's a rub in that > > it requires a atomic operation involved somehere to give that guarantee. > > > > I *did* check that the emitted code on relevant architectures is sane, > > but that doesn't guarantee anything for the future. > > > > Therefore I'm proposing to replace it with __ATOMIC_ACQ_REL which is > > definitely guaranteeing what we need, even if superflously heavy on some > > platforms. It still is significantly more efficient than > > __sync_synchronize() which is what was used before. I.e. it generates no > > code on x86 (MFENCE otherwise), and only a lwsync on PPC (hwsync > > otherwise, although I don't know why) and similar on ia64. > > A fully barrier on x86 should be an mfence, right? Right. I've not talked about changing full barrier semantics. What I was referring to is that until the atomics patch we always redefine read/write barriers to be full barriers when using gcc intrinsics. > With only a compiler barrier, you have loads ordered with respect to > loads and stores ordered with respect to stores, but the load/store > ordering isn't fully defined. Yes. > > Which is why these acquire/release fences, in contrast to > > acquire/release operations, have more guarantees... You put your finger > > right onto the spot. > > But, uh, we still don't seem to know what those guarantees actually ARE. Paired together they form a synchronized-with relationship. Problem #1 is that the standard's language isn't, to me at least, clear if there's not some case where that's not the case. Problem #2 is that our current README.barrier definition doesn't actually require barriers to be paired. Which imo is bad, but still a fact. The definition of ACQ_REL is pretty clearly sufficient imo: "Full barrier in both directions and synchronizes with acquire loads and release stores in another thread.". > >> Say I want to appear to only change things while flag is 1, so I > >> write this code: > >> > >> flag = 1 > >> acquire barrier > >> things++ > >> release barrier > >> flag = 0 > >> > >> With the definition you (and Oracle) propose > >> this won't work, because > >> there's nothing to keep the modification of things from being > >> reordered before flag = 1. What good is that? Apparently, I don't > >> have any idea! > > > > As written above, I don't think that applies to oracle's definition? > > Oracle's definition doesn't look sufficient there. Perhaps I'm just not understanding what you want to show with this example. This started as a discussion of comparing acquire/release with read/write barriers, right? Or are you generally wondering about the point acquire/release barriers? > The acquire > barrier guarantees that the load operations before the barrier will be > completed before the load and store operations after the barrier, but > the only operation before the barrier is a store, not a load, so it > guarantees nothing. Well, 'acquire' operations always have to related to a load. That's why standalone 'acquire fences' or 'acquire barriers' are more heavyweight than just a acquiring read. And realistically, in the above example, you'd have to read flag to see that it's not already 1, right? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: