Thread: Inefficient barriers on solaris with sun cc
Hi, Binaries compiled on solaris using sun studio cc currently don't have compiler and memory barriers implemented. That means we fall back to relatively slow generic implementations for those. Especially compiler, read, write barriers will be much slower than necessary (since they all just need to prevent compiler reordering as both sparc and x86 are run in TSO mode under solaris). Since my estimate is that we'll use more and more barriers, that's going to hurt more and more. I do *not* plan to do anything about it atm, I just thought it might be helpful to have this stated somewhere searchable. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Sep 25, 2014 at 9:34 AM, Andres Freund <andres@2ndquadrant.com> wrote: > Binaries compiled on solaris using sun studio cc currently don't have > compiler and memory barriers implemented. That means we fall back to > relatively slow generic implementations for those. Especially compiler, > read, write barriers will be much slower than necessary (since they all > just need to prevent compiler reordering as both sparc and x86 are run > in TSO mode under solaris). > > Since my estimate is that we'll use more and more barriers, that's going > to hurt more and more. > > I do *not* plan to do anything about it atm, I just thought it might be > helpful to have this stated somewhere searchable. To put that another way: If there are any Sun Studio users out there who care about performance on big iron, please send a patch to fix this... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
25.09.2014, 16:34, Andres Freund kirjoitti: > Binaries compiled on solaris using sun studio cc currently don't have > compiler and memory barriers implemented. That means we fall back to > relatively slow generic implementations for those. Especially compiler, > read, write barriers will be much slower than necessary (since they all > just need to prevent compiler reordering as both sparc and x86 are run > in TSO mode under solaris). Attached patch implements compiler and memory barriers for Solaris Studio based on documentation at http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html I defined read and write barriers as acquire and release barriers instead of pure read and write ones as that's what other platforms appear to do. / Oskari
Attachment
On Fri, Sep 26, 2014 at 8:36 AM, Oskari Saarenmaa <os@ohmu.fi> wrote: > 25.09.2014, 16:34, Andres Freund kirjoitti: >> Binaries compiled on solaris using sun studio cc currently don't have >> compiler and memory barriers implemented. That means we fall back to >> relatively slow generic implementations for those. Especially compiler, >> read, write barriers will be much slower than necessary (since they all >> just need to prevent compiler reordering as both sparc and x86 are run >> in TSO mode under solaris). > > Attached patch implements compiler and memory barriers for Solaris Studio > based on documentation at > http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html > > I defined read and write barriers as acquire and release barriers instead of > pure read and write ones as that's what other platforms appear to do. So you think a read barrier is the same thing as an acquire barrier and a write barrier is the same as a release barrier? That would be surprising. It's certainly not true in general. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
26.09.2014, 15:39, Robert Haas kirjoitti: > On Fri, Sep 26, 2014 at 8:36 AM, Oskari Saarenmaa <os@ohmu.fi> wrote: >> 25.09.2014, 16:34, Andres Freund kirjoitti: >>> Binaries compiled on solaris using sun studio cc currently don't have >>> compiler and memory barriers implemented. That means we fall back to >>> relatively slow generic implementations for those. Especially compiler, >>> read, write barriers will be much slower than necessary (since they all >>> just need to prevent compiler reordering as both sparc and x86 are run >>> in TSO mode under solaris). >> >> Attached patch implements compiler and memory barriers for Solaris Studio >> based on documentation at >> http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html >> >> I defined read and write barriers as acquire and release barriers instead of >> pure read and write ones as that's what other platforms appear to do. > > So you think a read barrier is the same thing as an acquire barrier > and a write barrier is the same as a release barrier? That would be > surprising. It's certainly not true in general. The above doc describes the difference: read barrier requires loads before the barrier to be completed before loads after the barrier - an acquire barrier is the same, but it also requires loads to be complete before stores after the barrier. Similarly write barrier requires stores before the barrier to be completed before stores after the barrier - a release barrier is the same, but it also requires loads before the barrier to be completed before stores after the barrier. So acquire is read + loads-before-stores and release is write + loads-before-stores. The generic gcc atomics also define read barrier to __ATOMIC_ACQUIRE and write barrier to __ATOMIC_RELEASE. / Oskari
On 2014-09-26 08:39:38 -0400, Robert Haas wrote: > On Fri, Sep 26, 2014 at 8:36 AM, Oskari Saarenmaa <os@ohmu.fi> wrote: > > 25.09.2014, 16:34, Andres Freund kirjoitti: > >> Binaries compiled on solaris using sun studio cc currently don't have > >> compiler and memory barriers implemented. That means we fall back to > >> relatively slow generic implementations for those. Especially compiler, > >> read, write barriers will be much slower than necessary (since they all > >> just need to prevent compiler reordering as both sparc and x86 are run > >> in TSO mode under solaris). > > > > Attached patch implements compiler and memory barriers for Solaris Studio > > based on documentation at > > http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html > > > > I defined read and write barriers as acquire and release barriers instead of > > pure read and write ones as that's what other platforms appear to do. > > So you think a read barrier is the same thing as an acquire barrier > and a write barrier is the same as a release barrier? That would be > surprising. It's certainly not true in general. It's generally true that a read barrier is implied by an acquire barrier, no? Same for write barriers being implied by read barriers. Neither is true the other way round, but that's fine. Given how postgres uses memory barriers we actually could declare read/write barriers to be compiler barriers when on solaris. Both supported architectures (x86, sparc) are run in TSO mode. As the existing barrier code for x86 says:* Both 32 and 64 bit x86 do not allow loads to be reordered with other loads,* or storesto be reordered with other stores, but a load can be performed* before a subsequent store.** Technically, some x86-ishchips support uncached memory access and/or* special instructions that are weakly ordered. In those cases we'd need*the read and write barriers to be lfence and sfence. But since we don't* do those things, a compiler barrier shouldbe enough.** "lock; addl" has worked for longer than "mfence". It's also rumored to be* faster in many scenarios Unless I miss something the same is true for sparc *in solaris userland*. But I'd be perfectly happy to go with something like Oksari's version because it's still much better than the current code. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Sep 26, 2014 at 8:55 AM, Oskari Saarenmaa <os@ohmu.fi> wrote: >> So you think a read barrier is the same thing as an acquire barrier >> and a write barrier is the same as a release barrier? That would be >> surprising. It's certainly not true in general. > > The above doc describes the difference: read barrier requires loads before > the barrier to be completed before loads after the barrier - an acquire > barrier is the same, but it also requires loads to be complete before stores > after the barrier. > > Similarly write barrier requires stores before the barrier to be completed > before stores after the barrier - a release barrier is the same, but it also > requires loads before the barrier to be completed before stores after the > barrier. > > So acquire is read + loads-before-stores and release is write + > loads-before-stores. Hmm. My impression was that an acquire barrier means that loads and stores can migrate forward across the barrier but not backward; and that a release barrier means that loads and stores can migrate backward across the barrier but not forward. I'm actually not really sure what this means unless the barrier also does something in and of itself. For example, consider this: some stuff CAS(&lock, 0, 1) // i am an acquire barrier more stuff lock = 0 // i am a release barrier even more stuff If the CAS() and lock = 0 instructions were FULL barriers, then we'd be saying that the stuff that happens in the critical section needs to be exactly "more stuff". But if they are acquire and release barriers, respectively, then the CPU is allowed to move "some stuff" or "even more stuff" into the critical section; but what it can't do is move "more stuff" out. Now if you just have a naked acquire barrier that is not doing anything itself, I don't really know what the semantics of that should be. Say I want to appear to only change things while flag is 1, so I write this code: flag = 1 acquire barrier things++ release barrier flag = 0 With the definition you (and Oracle) propose, this won't work, because there's nothing to keep the modification of things from being reordered before flag = 1. What good is that? Apparently, I don't have any idea! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
26.09.2014, 17:28, Robert Haas kirjoitti: > On Fri, Sep 26, 2014 at 8:55 AM, Oskari Saarenmaa <os@ohmu.fi> wrote: >>> So you think a read barrier is the same thing as an acquire barrier >>> and a write barrier is the same as a release barrier? That would be >>> surprising. It's certainly not true in general. >> >> The above doc describes the difference: read barrier requires loads before >> the barrier to be completed before loads after the barrier - an acquire >> barrier is the same, but it also requires loads to be complete before stores >> after the barrier. >> >> Similarly write barrier requires stores before the barrier to be completed >> before stores after the barrier - a release barrier is the same, but it also >> requires loads before the barrier to be completed before stores after the >> barrier. >> >> So acquire is read + loads-before-stores and release is write + >> loads-before-stores. > > Hmm. My impression was that an acquire barrier means that loads and > stores can migrate forward across the barrier but not backward; and > that a release barrier means that loads and stores can migrate > backward across the barrier but not forward. I'm actually not really > sure what this means unless the barrier also does something in and of > itself. For example, consider this: [...] > With the definition you (and Oracle) propose, this won't work, because > there's nothing to keep the modification of things from being > reordered before flag = 1. What good is that? Apparently, I don't > have any idea! I'm not proposing any definition for acquire or release barriers, I was just proposing to use the things Solaris Studio defines as acquire and release barriers to implement read and write barriers in PostgreSQL because similar barrier names are used with gcc and on Solaris Studio acquire is a stronger read barrier and release is a stronger write barrier. atomics.h's definition of pg_(read|write)_barrier doesn't have any requirements for loads before stores, though, so we could use __machine_r_barrier and __machine_w_barrier instead. But as Andres pointed out all this is probably unnecessary and we could define read and write barrier as __compiler_barrier with Solaris Studio cc. It's only available for Solaris (x86 and Sparc) and Linux (x86). / Oskari
On 2014-09-26 10:28:21 -0400, Robert Haas wrote: > On Fri, Sep 26, 2014 at 8:55 AM, Oskari Saarenmaa <os@ohmu.fi> wrote: > >> So you think a read barrier is the same thing as an acquire barrier > >> and a write barrier is the same as a release barrier? That would be > >> surprising. It's certainly not true in general. > > > > The above doc describes the difference: read barrier requires loads before > > the barrier to be completed before loads after the barrier - an acquire > > barrier is the same, but it also requires loads to be complete before stores > > after the barrier. > > > > Similarly write barrier requires stores before the barrier to be completed > > before stores after the barrier - a release barrier is the same, but it also > > requires loads before the barrier to be completed before stores after the > > barrier. > > > > So acquire is read + loads-before-stores and release is write + > > loads-before-stores. > > Hmm. My impression was that an acquire barrier means that loads and > stores can migrate forward across the barrier but not backward; and > that a release barrier means that loads and stores can migrate > backward across the barrier but not forward. It's actually more complex than that :( Simple things first: Oracle's definition seems pretty iron clad: http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html __machine_acq_barrier is a clear superset of __machine_r_barrier and __machine_rel_barrier is a clear superset of __machine_w_barrier And that's what we're essentially discussing, no? That said, there seems to be no reason to avoid using __machine_r/w_barrier(). But for the reason why I defined pg_read_barrier/write_barrier to __atomic_thread_fence(__ATOMIC_ACQUIRE/RELEASE): The C11/C++11 definition it's made for is hellishly hard to understand. There's very subtle differences between acquire/release operation and acquire/release fences. 29.8.2/7.17.4 seems to be the relevant parts of the standards. I think it essentially guarantees the mapping we're talking about, but it's not entirely clear. The way acquire/release fences are defined is that they form a 'synchronizes-with' relationship with each other. Which would, I think, be sufficient given that without a release like operation on the other thread a read/wrie barrier isn't worth much. But there's a rub in that it requires a atomic operation involved somehere to give that guarantee. I *did* check that the emitted code on relevant architectures is sane, but that doesn't guarantee anything for the future. Therefore I'm proposing to replace it with __ATOMIC_ACQ_REL which is definitely guaranteeing what we need, even if superflously heavy on some platforms. It still is significantly more efficient than __sync_synchronize() which is what was used before. I.e. it generates no code on x86 (MFENCE otherwise), and only a lwsync on PPC (hwsync otherwise, although I don't know why) and similar on ia64. As a reference, relevant standard sections are: C11: 5.1.2.4 5); 7.17.4 C++11: 29.3; 1.10 Not that we can rely on those, but I think it's a good thing to orient on. > I'm actually not really sure what this means unless the barrier also > does something in and of itself. > For example, consider this: > > some stuff > CAS(&lock, 0, 1) // i am an acquire barrier > more stuff > lock = 0 // i am a release barrier > even more stuff > > If the CAS() and lock = 0 instructions were FULL barriers, then we'd > be saying that the stuff that happens in the critical section needs to > be exactly "more stuff". But if they are acquire and release > barriers, respectively, then the CPU is allowed to move "some stuff" > or "even more stuff" into the critical section; but what it can't do > is move "more stuff" out. > Now if you just have a naked acquire barrier that is not doing > anything itself, I don't really know what the semantics of that should > be. Which is why these acquire/release fences, in contrast to acquire/release operations, have more guarantees... You put your finger right onto the spot. > Say I want to appear to only change things while flag is 1, so I > write this code: > > flag = 1 > acquire barrier > things++ > release barrier > flag = 0 > > With the definition you (and Oracle) propose As written above, I don't think that applies to oracle's definition? > this won't work, because > there's nothing to keep the modification of things from being > reordered before flag = 1. What good is that? Apparently, I don't > have any idea! I hope it's a bit clearer now? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Oct 2, 2014 at 10:34 AM, Andres Freund <andres@2ndquadrant.com> wrote: > It's actually more complex than that :( > > Simple things first: > > Oracle's definition seems pretty iron clad: > http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html > __machine_acq_barrier is a clear superset of __machine_r_barrier and > __machine_rel_barrier is a clear superset of __machine_w_barrier > > And that's what we're essentially discussing, no? That said, there seems > to be no reason to avoid using __machine_r/w_barrier(). So let's use those, then. > But for the reason why I defined pg_read_barrier/write_barrier to > __atomic_thread_fence(__ATOMIC_ACQUIRE/RELEASE): > > The C11/C++11 definition it's made for is hellishly hard to > understand. There's very subtle differences between acquire/release > operation and acquire/release fences. 29.8.2/7.17.4 seems to be the relevant > parts of the standards. I think it essentially guarantees the mapping > we're talking about, but it's not entirely clear. > > The way acquire/release fences are defined is that they form a > 'synchronizes-with' relationship with each other. Which would, I think, > be sufficient given that without a release like operation on the other > thread a read/wrie barrier isn't worth much. But there's a rub in that > it requires a atomic operation involved somehere to give that guarantee. > > I *did* check that the emitted code on relevant architectures is sane, > but that doesn't guarantee anything for the future. > > Therefore I'm proposing to replace it with __ATOMIC_ACQ_REL which is > definitely guaranteeing what we need, even if superflously heavy on some > platforms. It still is significantly more efficient than > __sync_synchronize() which is what was used before. I.e. it generates no > code on x86 (MFENCE otherwise), and only a lwsync on PPC (hwsync > otherwise, although I don't know why) and similar on ia64. A fully barrier on x86 should be an mfence, right? With only a compiler barrier, you have loads ordered with respect to loads and stores ordered with respect to stores, but the load/store ordering isn't fully defined. > Which is why these acquire/release fences, in contrast to > acquire/release operations, have more guarantees... You put your finger > right onto the spot. But, uh, we still don't seem to know what those guarantees actually ARE. >> Say I want to appear to only change things while flag is 1, so I >> write this code: >> >> flag = 1 >> acquire barrier >> things++ >> release barrier >> flag = 0 >> >> With the definition you (and Oracle) propose >> this won't work, because >> there's nothing to keep the modification of things from being >> reordered before flag = 1. What good is that? Apparently, I don't >> have any idea! > > As written above, I don't think that applies to oracle's definition? Oracle's definition doesn't look sufficient there. The acquire barrier guarantees that the load operations before the barrier will be completed before the load and store operations after the barrier, but the only operation before the barrier is a store, not a load, so it guarantees nothing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-10-02 10:55:06 -0400, Robert Haas wrote: > On Thu, Oct 2, 2014 at 10:34 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > It's actually more complex than that :( > > > > Simple things first: > > > > Oracle's definition seems pretty iron clad: > > http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html > > __machine_acq_barrier is a clear superset of __machine_r_barrier and > > __machine_rel_barrier is a clear superset of __machine_w_barrier > > > > And that's what we're essentially discussing, no? That said, there seems > > to be no reason to avoid using __machine_r/w_barrier(). > > So let's use those, then. Right, I've never contended that. > > But for the reason why I defined pg_read_barrier/write_barrier to > > __atomic_thread_fence(__ATOMIC_ACQUIRE/RELEASE): > > > > The C11/C++11 definition it's made for is hellishly hard to > > understand. There's very subtle differences between acquire/release > > operation and acquire/release fences. 29.8.2/7.17.4 seems to be the relevant > > parts of the standards. I think it essentially guarantees the mapping > > we're talking about, but it's not entirely clear. > > > > The way acquire/release fences are defined is that they form a > > 'synchronizes-with' relationship with each other. Which would, I think, > > be sufficient given that without a release like operation on the other > > thread a read/wrie barrier isn't worth much. But there's a rub in that > > it requires a atomic operation involved somehere to give that guarantee. > > > > I *did* check that the emitted code on relevant architectures is sane, > > but that doesn't guarantee anything for the future. > > > > Therefore I'm proposing to replace it with __ATOMIC_ACQ_REL which is > > definitely guaranteeing what we need, even if superflously heavy on some > > platforms. It still is significantly more efficient than > > __sync_synchronize() which is what was used before. I.e. it generates no > > code on x86 (MFENCE otherwise), and only a lwsync on PPC (hwsync > > otherwise, although I don't know why) and similar on ia64. > > A fully barrier on x86 should be an mfence, right? Right. I've not talked about changing full barrier semantics. What I was referring to is that until the atomics patch we always redefine read/write barriers to be full barriers when using gcc intrinsics. > With only a compiler barrier, you have loads ordered with respect to > loads and stores ordered with respect to stores, but the load/store > ordering isn't fully defined. Yes. > > Which is why these acquire/release fences, in contrast to > > acquire/release operations, have more guarantees... You put your finger > > right onto the spot. > > But, uh, we still don't seem to know what those guarantees actually ARE. Paired together they form a synchronized-with relationship. Problem #1 is that the standard's language isn't, to me at least, clear if there's not some case where that's not the case. Problem #2 is that our current README.barrier definition doesn't actually require barriers to be paired. Which imo is bad, but still a fact. The definition of ACQ_REL is pretty clearly sufficient imo: "Full barrier in both directions and synchronizes with acquire loads and release stores in another thread.". > >> Say I want to appear to only change things while flag is 1, so I > >> write this code: > >> > >> flag = 1 > >> acquire barrier > >> things++ > >> release barrier > >> flag = 0 > >> > >> With the definition you (and Oracle) propose > >> this won't work, because > >> there's nothing to keep the modification of things from being > >> reordered before flag = 1. What good is that? Apparently, I don't > >> have any idea! > > > > As written above, I don't think that applies to oracle's definition? > > Oracle's definition doesn't look sufficient there. Perhaps I'm just not understanding what you want to show with this example. This started as a discussion of comparing acquire/release with read/write barriers, right? Or are you generally wondering about the point acquire/release barriers? > The acquire > barrier guarantees that the load operations before the barrier will be > completed before the load and store operations after the barrier, but > the only operation before the barrier is a store, not a load, so it > guarantees nothing. Well, 'acquire' operations always have to related to a load. That's why standalone 'acquire fences' or 'acquire barriers' are more heavyweight than just a acquiring read. And realistically, in the above example, you'd have to read flag to see that it's not already 1, right? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Oct 2, 2014 at 11:18 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> So let's use those, then. > > Right, I've never contended that. OK, cool. >> A fully barrier on x86 should be an mfence, right? > > Right. I've not talked about changing full barrier semantics. What I was > referring to is that until the atomics patch we always redefine > read/write barriers to be full barriers when using gcc intrinsics. OK, got it. If there's a cheaper way to tell gcc "loads before loads" or "stores before stores", I'm fine with doing that for those cases. >> > Which is why these acquire/release fences, in contrast to >> > acquire/release operations, have more guarantees... You put your finger >> > right onto the spot. >> >> But, uh, we still don't seem to know what those guarantees actually ARE. > > Paired together they form a synchronized-with relationship. Problem #1 > is that the standard's language isn't, to me at least, clear if there's > not some case where that's not the case. Problem #2 is that our current > README.barrier definition doesn't actually require barriers to be > paired. Which imo is bad, but still a fact. I don't know what a "synchronized-with relationship" means. Also, I pretty much designed those definitions to match what Linux does. And it doesn't require that either, though it says that in most cases it will work out that way. > The definition of ACQ_REL is pretty clearly sufficient imo: "Full > barrier in both directions and synchronizes with acquire loads and > release stores in another thread.". I dunno. What's an acquire load? What's a release store? I know what loads and stores are; I don't know what the adjectives mean. >> The acquire >> barrier guarantees that the load operations before the barrier will be >> completed before the load and store operations after the barrier, but >> the only operation before the barrier is a store, not a load, so it >> guarantees nothing. > > Well, 'acquire' operations always have to related to a load.That's why > standalone 'acquire fences' or 'acquire barriers' are more heavyweight > than just a acquiring read. Again, I can't judge any of this, because you haven't defined the terms anywhere. > And realistically, in the above example, you'd have to read flag to see > that it's not already 1, right? Not necessarily. You could be the only writer. Think about the way the backend entries in the stats system work. The point of setting the flag may be for other people to know whether the data is in the middle of being modified. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-10-02 11:35:32 -0400, Robert Haas wrote: > On Thu, Oct 2, 2014 at 11:18 AM, Andres Freund <andres@2ndquadrant.com> wrote: > >> > Which is why these acquire/release fences, in contrast to > >> > acquire/release operations, have more guarantees... You put your finger > >> > right onto the spot. > >> > >> But, uh, we still don't seem to know what those guarantees actually ARE. > > > > Paired together they form a synchronized-with relationship. Problem #1 > > is that the standard's language isn't, to me at least, clear if there's > > not some case where that's not the case. Problem #2 is that our current > > README.barrier definition doesn't actually require barriers to be > > paired. Which imo is bad, but still a fact. > > I don't know what a "synchronized-with relationship" means. I'm using the standard's language here, given that I'm trying to reason about its behaviour... What it means is that if you have a matching pair of acquire/release operations or barriers/fences everything that happened *before* the last release fence will be visible *after* executing the next acquire operation in a different thread-of-execution. And 'after' is defined in the way that is true if the 'acquiring' thread can see the result of the 'releasing' operation. I.e. no loads after the acquire can see values from before the release. My problem with the definition in the standard is that it's not particularly clear how acquire fences *without* a underlying explicit atomic operation are defined in the standard. I checked gcc's current code and it's fine in that regard. Also other popular concurrent open source stuff like http://git.qemu.org/?p=qemu.git;a=blob;f=include/qemu/atomic.h;hb=HEAD does precisely what I'm talking about: 100 #ifndef smp_wmb 101 #ifdef __ATOMIC_RELEASE 102 #define smp_wmb() __atomic_thread_fence(__ATOMIC_RELEASE) 103 #else 104 #define smp_wmb() __sync_synchronize() 105 #endif 106 #endif 107 108 #ifndef smp_rmb 109 #ifdef __ATOMIC_ACQUIRE 110 #define smp_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE) 111 #else 112 #define smp_rmb() __sync_synchronize() 113 #endif 114 #endif The commit that added it http://git.qemu.org/?p=qemu.git;a=commitdiff;h=5444e768ee1abe6e021bece19a9a932351f88c88 was written by one gcc guy and reviewed by another one... So I think we can be pretty sure that gcc's __atomic_thread_fence() behaves like we want. We probably have to be a bit more careful about extending that definition (by including atomic.h and doing atomic_thread_fence(memory_order_acquire)) to use general C11. Which is probably a couple years away anyway. > Also, I pretty much designed those definitions to match what Linux > does. And it doesn't require that either, though it says that in most > cases it will work out that way. My point is that that read barriers aren't particularly meaningful without a defined store order from another thread/process. Without any form of pairing you don't have that. The writing side could just have reordered the writes in a way you didn't want them. And the kernel docs do say "A lack of appropriate pairing is almost certainly an error". But since read barriers also pair with lock releases operations, that's normally not a big problem. > > The definition of ACQ_REL is pretty clearly sufficient imo: "Full > > barrier in both directions and synchronizes with acquire loads and > > release stores in another thread.". > > I dunno. What's an acquire load? What's a release store? I know > what loads and stores are; I don't know what the adjectives mean. An acquire load is either an explicit atomic load (tas, cmpxchg, etc also count) or a normal load combined with a acquire barrier. The symmetric definition is true for release store. (so, on x86 every load/store that prevents compiler reordering essentially a acquire/release store) > > And realistically, in the above example, you'd have to read flag to see > > that it's not already 1, right? > > Not necessarily. You could be the only writer. Think about the way > the backend entries in the stats system work. The point of setting > the flag may be for other people to know whether the data is in the > middle of being modified. So you're thinking about something seqlock alike... Isn't the problem then that you actually don't want acquire semantics, but release or write barrier semantics on that store? The acquire/read barrier part would be on the reader side, no? I'm still unsure what you want to show with that example? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Oct 2, 2014 at 2:06 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> Also, I pretty much designed those definitions to match what Linux >> does. And it doesn't require that either, though it says that in most >> cases it will work out that way. > > My point is that that read barriers aren't particularly meaningful > without a defined store order from another thread/process. Without any > form of pairing you don't have that. The writing side could just have > reordered the writes in a way you didn't want them. And the kernel docs > do say "A lack of appropriate pairing is almost certainly an error". But > since read barriers also pair with lock releases operations, that's > normally not a big problem. Agreed, but it's possible to have a read-fence where an atomic operation provides the ordering on the other side, or something like that. > I'm still unsure what you want to show with that example? Me, too. I think we've drifted off in the weeds. Do we know what we need to know to fix $SUBJECT? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-10-06 11:38:47 -0400, Robert Haas wrote: > On Thu, Oct 2, 2014 at 2:06 PM, Andres Freund <andres@2ndquadrant.com> wrote: > >> Also, I pretty much designed those definitions to match what Linux > >> does. And it doesn't require that either, though it says that in most > >> cases it will work out that way. > > > > My point is that that read barriers aren't particularly meaningful > > without a defined store order from another thread/process. Without any > > form of pairing you don't have that. The writing side could just have > > reordered the writes in a way you didn't want them. And the kernel docs > > do say "A lack of appropriate pairing is almost certainly an error". But > > since read barriers also pair with lock releases operations, that's > > normally not a big problem. > > Agreed, but it's possible to have a read-fence where an atomic > operation provides the ordering on the other side, or something like > that. Sure, that's one of the possible pairings. Most atomics have barrier semantics... > > I'm still unsure what you want to show with that example? > > Me, too. I think we've drifted off in the weeds. Do we know what we > need to know to fix $SUBJECT? I think we can pretty much apply Oskari's patch after replacing acquire/release with read/write intrinsics. I'm opening a bug with the gcc folks about clarifying the docs on their intrinsics. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
06.10.2014, 17:42, Andres Freund kirjoitti: > I think we can pretty much apply Oskari's patch after replacing > acquire/release with read/write intrinsics. Attached a patch rebased to current master using read & write barriers. / Oskari