Thread: optimize atomic exchanges
On Fri, Nov 10, 2023 at 08:55:29PM -0600, Nathan Bossart wrote: > On Fri, Nov 10, 2023 at 06:48:39PM -0800, Andres Freund wrote: >> Yes. We should optimize pg_atomic_exchange_u32() one of these days - it can be >> done *far* faster than a cmpxchg. When I was adding the atomic abstraction >> there was concern with utilizing too many different atomic instructions. I >> didn't really agree back then, but these days I really don't see a reason to >> not use a few more intrinsics. > > I might give this a try, if for no other reason than it'd force me to > improve my mental model of this stuff. :) Here's a first draft. I haven't attempted to add implementations for PowerPC, and I only added the __atomic version for gcc since __sync_lock_test_and_set() only supports setting the value to 1 on some platforms. Otherwise, I tried to add specialized atomic exchange implementations wherever there existed other specialized atomic implementations. I haven't done any sort of performance testing on this yet. Some preliminary web searches suggest that there is unlikely to be much difference between cmpxchg and xchg, but presumably there's some difference between xchg and doing cmpxchg in a while loop (as is done in atomics/generic.h today). I'll report back once I've had a chance to do some testing... -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Wed, Nov 29, 2023 at 03:29:05PM -0600, Nathan Bossart wrote: > I haven't done any sort of performance testing on this yet. Some > preliminary web searches suggest that there is unlikely to be much > difference between cmpxchg and xchg, but presumably there's some difference > between xchg and doing cmpxchg in a while loop (as is done in > atomics/generic.h today). I'll report back once I've had a chance to do > some testing... Some rudimentary tests show a >40% speedup with this patch on x86_64. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Hi, On 2023-11-30 21:18:15 -0600, Nathan Bossart wrote: > On Wed, Nov 29, 2023 at 03:29:05PM -0600, Nathan Bossart wrote: > > I haven't done any sort of performance testing on this yet. Some > > preliminary web searches suggest that there is unlikely to be much > > difference between cmpxchg and xchg, but presumably there's some difference > > between xchg and doing cmpxchg in a while loop (as is done in > > atomics/generic.h today). I'll report back once I've had a chance to do > > some testing... > > Some rudimentary tests show a >40% speedup with this patch on x86_64. On bigger machines, with contention, the wins are likely much higher. I see two orders of magnitude higher throughput in a test program that I had around, on a two socket cascade lake machine. Of course it's also much less powerfull... Greetings, Andres Freund
On Thu, Nov 30, 2023 at 07:56:27PM -0800, Andres Freund wrote: > On 2023-11-30 21:18:15 -0600, Nathan Bossart wrote: >> Some rudimentary tests show a >40% speedup with this patch on x86_64. > > On bigger machines, with contention, the wins are likely much higher. I see > two orders of magnitude higher throughput in a test program that I had around, > on a two socket cascade lake machine. Of course it's also much less > powerfull... Nice. Thanks for trying it out. One thing on my mind is whether we should bother with the inline assembly versions. It looks like gcc has had __atomic since 4.7.0 (2012), so I'm not sure we gain much from them. OTOH they are pretty simple and seem unlikely to cause too much trouble. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Thu, Nov 30, 2023 at 10:35:22PM -0600, Nathan Bossart wrote: > One thing on my mind is whether we should bother with the inline assembly > versions. It looks like gcc has had __atomic since 4.7.0 (2012), so I'm > not sure we gain much from them. OTOH they are pretty simple and seem > unlikely to cause too much trouble. Barring objections or additional feedback, I think I'm inclined to press forward with this one and commit it in the next week or two. I'm currently planning to keep the inline assembly, but I'm considering removing the configuration checks for __atomic_exchange_n() if the availability of __atomic_compare_exchange_n() seems like a reliable indicator of its presence. Thoughts? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Mon, Dec 04, 2023 at 12:18:05PM -0600, Nathan Bossart wrote: > Barring objections or additional feedback, I think I'm inclined to press > forward with this one and commit it in the next week or two. I'm currently > planning to keep the inline assembly, but I'm considering removing the > configuration checks for __atomic_exchange_n() if the availability of > __atomic_compare_exchange_n() seems like a reliable indicator of its > presence. Thoughts? Concretely, like this. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 2023-12-04 15:08:57 -0600, Nathan Bossart wrote: > On Mon, Dec 04, 2023 at 12:18:05PM -0600, Nathan Bossart wrote: > > Barring objections or additional feedback, I think I'm inclined to press > > forward with this one and commit it in the next week or two. I'm currently > > planning to keep the inline assembly, but I'm considering removing the > > configuration checks for __atomic_exchange_n() if the availability of > > __atomic_compare_exchange_n() seems like a reliable indicator of its > > presence. Thoughts? I don't think we need the inline asm. Otherwise looks good. Greetings, Andres Freund
On Fri, Dec 15, 2023 at 04:56:27AM -0800, Andres Freund wrote: > I don't think we need the inline asm. Otherwise looks good. Committed with that change. Thanks for reviewing! I am going to watch the buildfarm especially closely for this one. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com