Re: Inefficient barriers on solaris with sun cc - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Inefficient barriers on solaris with sun cc |
Date | |
Msg-id | 20141002180603.GL7158@awork2.anarazel.de Whole thread Raw |
In response to | Re: Inefficient barriers on solaris with sun cc (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Inefficient barriers on solaris with sun cc
|
List | pgsql-hackers |
On 2014-10-02 11:35:32 -0400, Robert Haas wrote: > On Thu, Oct 2, 2014 at 11:18 AM, Andres Freund <andres@2ndquadrant.com> wrote: > >> > Which is why these acquire/release fences, in contrast to > >> > acquire/release operations, have more guarantees... You put your finger > >> > right onto the spot. > >> > >> But, uh, we still don't seem to know what those guarantees actually ARE. > > > > Paired together they form a synchronized-with relationship. Problem #1 > > is that the standard's language isn't, to me at least, clear if there's > > not some case where that's not the case. Problem #2 is that our current > > README.barrier definition doesn't actually require barriers to be > > paired. Which imo is bad, but still a fact. > > I don't know what a "synchronized-with relationship" means. I'm using the standard's language here, given that I'm trying to reason about its behaviour... What it means is that if you have a matching pair of acquire/release operations or barriers/fences everything that happened *before* the last release fence will be visible *after* executing the next acquire operation in a different thread-of-execution. And 'after' is defined in the way that is true if the 'acquiring' thread can see the result of the 'releasing' operation. I.e. no loads after the acquire can see values from before the release. My problem with the definition in the standard is that it's not particularly clear how acquire fences *without* a underlying explicit atomic operation are defined in the standard. I checked gcc's current code and it's fine in that regard. Also other popular concurrent open source stuff like http://git.qemu.org/?p=qemu.git;a=blob;f=include/qemu/atomic.h;hb=HEAD does precisely what I'm talking about: 100 #ifndef smp_wmb 101 #ifdef __ATOMIC_RELEASE 102 #define smp_wmb() __atomic_thread_fence(__ATOMIC_RELEASE) 103 #else 104 #define smp_wmb() __sync_synchronize() 105 #endif 106 #endif 107 108 #ifndef smp_rmb 109 #ifdef __ATOMIC_ACQUIRE 110 #define smp_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE) 111 #else 112 #define smp_rmb() __sync_synchronize() 113 #endif 114 #endif The commit that added it http://git.qemu.org/?p=qemu.git;a=commitdiff;h=5444e768ee1abe6e021bece19a9a932351f88c88 was written by one gcc guy and reviewed by another one... So I think we can be pretty sure that gcc's __atomic_thread_fence() behaves like we want. We probably have to be a bit more careful about extending that definition (by including atomic.h and doing atomic_thread_fence(memory_order_acquire)) to use general C11. Which is probably a couple years away anyway. > Also, I pretty much designed those definitions to match what Linux > does. And it doesn't require that either, though it says that in most > cases it will work out that way. My point is that that read barriers aren't particularly meaningful without a defined store order from another thread/process. Without any form of pairing you don't have that. The writing side could just have reordered the writes in a way you didn't want them. And the kernel docs do say "A lack of appropriate pairing is almost certainly an error". But since read barriers also pair with lock releases operations, that's normally not a big problem. > > The definition of ACQ_REL is pretty clearly sufficient imo: "Full > > barrier in both directions and synchronizes with acquire loads and > > release stores in another thread.". > > I dunno. What's an acquire load? What's a release store? I know > what loads and stores are; I don't know what the adjectives mean. An acquire load is either an explicit atomic load (tas, cmpxchg, etc also count) or a normal load combined with a acquire barrier. The symmetric definition is true for release store. (so, on x86 every load/store that prevents compiler reordering essentially a acquire/release store) > > And realistically, in the above example, you'd have to read flag to see > > that it's not already 1, right? > > Not necessarily. You could be the only writer. Think about the way > the backend entries in the stats system work. The point of setting > the flag may be for other people to know whether the data is in the > middle of being modified. So you're thinking about something seqlock alike... Isn't the problem then that you actually don't want acquire semantics, but release or write barrier semantics on that store? The acquire/read barrier part would be on the reader side, no? I'm still unsure what you want to show with that example? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: