Thread: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64
[PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64
From
"Zidenberg, Tsahi"
Date:
Outline-atomics is a gcc compilation flag that adds runtime detection of weather or not the cpu supports atomic instructions.CPUs that don't support atomic instructions will use the old load-exclusive/store-exclusive instructions. Ifa different compilation flag defined an architecture that unconditionally supports atomic instructions (e.g. -march=armv8.2),the outline-atomic flag will have no effect. The patch was tested to improve pgbench simple-update by 10% and sysbench write-only by 3% on a 64-core armv8.2 machine (AWSm6g.16xlarge). Select-only and read-only benchmarks were not significantly affected, and neither was performance on a16-core armv8.0 machine that does not support atomic instructions (AWS a1.4xlarge). The patch uses an existing configure.in macro to detect compiler support of the flag. Checking for aarch64 machine is notstrictly necessary, but was added for readability. Thank you! Tsahi
Attachment
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64
From
"Zidenberg, Tsahi"
Date:
On 01/07/2020, 18:40, "Zidenberg, Tsahi" <tsahee@amazon.com> wrote: > Outline-atomics is a gcc compilation flag that adds runtime detection of weather or not the cpu > supports atomic instructions. CPUs that don't support atomic instructions will use the old > load-exclusive/store-exclusive instructions. If a different compilation flag defined an architecture > that unconditionally supports atomic instructions (e.g. -march=armv8.2), the outline-atomic flag > will have no effect. > > The patch was tested to improve pgbench simple-update by 10% and sysbench write-only by 3% > on a 64-core armv8.2 machine (AWS m6g.16xlarge). Select-only and read-only benchmarks were > not significantly affected, and neither was performance on a 16-core armv8.0 machine that does > not support atomic instructions (AWS a1.4xlarge). > > The patch uses an existing configure.in macro to detect compiler support of the flag. Checking for > aarch64 machine is not strictly necessary, but was added for readability. Added a commitfest entry: https://commitfest.postgresql.org/29/2637/ Thank you! Tsahi
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64
From
Andres Freund
Date:
Hi, On 2020-07-01 15:40:38 +0000, Zidenberg, Tsahi wrote: > Outline-atomics is a gcc compilation flag that adds runtime detection > of weather or not the cpu supports atomic instructions. CPUs that > don't support atomic instructions will use the old > load-exclusive/store-exclusive instructions. If a different > compilation flag defined an architecture that unconditionally supports > atomic instructions (e.g. -march=armv8.2), the outline-atomic flag > will have no effect. Sounds attractive. > The patch was tested to improve pgbench simple-update by 10% and > sysbench write-only by 3% on a 64-core armv8.2 machine (AWS > m6g.16xlarge). Select-only and read-only benchmarks were not > significantly affected, and neither was performance on a 16-core > armv8.0 machine that does not support atomic instructions (AWS > a1.4xlarge). What does "not significantly affected" exactly mean? Could you post the raw numbers? I'm a bit concerned that the additional conditional branches on platforms without non ll/sc atomics could hurt noticably. I'm surprised that read-only didn't benefit - with ll/sc that ought to have pretty high contention on a few lwlocks. Could you post the numbers? Greetings, Andres Freund
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64
From
"Zidenberg, Tsahi"
Date:
Hello! First, I apologize for taking so long to answer. This e-mail regretfully got lost in my inbox. On 24/07/2020, 4:17, "Andres Freund" <andres@anarazel.de> wrote: > What does "not significantly affected" exactly mean? Could you post the > raw numbers? The following tests show benchmark behavior on m6g.8xl instance (32-core with LSE support) and a1.4xlarge (16-core, no LSE support) with and without the patch, based on postgresql 12.4. Tests are pgbench select-only/simple-update, and sysbench read-only/write only. . select-only. simple-update. read-only. write-only m6g.8xlarge/vanila. 482130. 56275. 273327. 33364 m6g.8xlarge/patch. 493748. 59681. 262702. 33024 a1.4xlarge/vanila. 82437. 13978. 62489. 2928 a1.4xlarge/patch. 79499. 13932. 62796. 2945 Results obviously change with OS / parameters /etc. I have attempted ensure a fair comparison, But I don't think these numbers should be taken as absolute. As reference points, m6g instance compiled with -march=native flag, and m5g (x86) instances: m6g.8xlarge/native. 522771. 60354. 261366. 33582 m5.8xlarge. 362908. 58732. 147730. 32750 > I'm a bit concerned that the additional conditional > branches on platforms without non ll/sc atomics could hurt noticably. As can be seen in a1 results - the difference for CPUSs with no LSE atomic support is low. Locks have one branch added, which is always taken the same way and thus easy to predict. > I'm surprised that read-only didn't benefit - with ll/sc that ought to > have pretty high contention on a few lwlocks. These results show only about 6% performance increase in simple-update, and very close performance in other results, most of which could be attributed to benchmark result jitter. These results from "well behaved" benchmarks do not show the full importance of using outline-atomics. I have observed in some experiments with other values and larger systems a crush of performance including read-only tests, which was caused by continuously failing to commit strx instructions. In such cases, outline-atomics improved performance by more than 2x factor. These cases are not always easy to replicate. Thank you! and sorry again for the delay Tsahi Zidenberg
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64
From
Michael Paquier
Date:
On Sun, Sep 06, 2020 at 09:00:02PM +0000, Zidenberg, Tsahi wrote: > These results show only about 6% performance increase in simple-update, and very close > performance in other results, most of which could be attributed to benchmark result jitter. > These results from "well behaved" benchmarks do not show the full importance of using > outline-atomics. I have observed in some experiments with other values and larger systems > a crush of performance including read-only tests, which was caused by continuously failing to > commit strx instructions. In such cases, outline-atomics improved performance by more > than 2x factor. These cases are not always easy to replicate. Interesting stuff. ARM-related otimizations is not something you see a lot around here. Please note that your latest patch fails to apply, could you provide a rebase? I am marking the patch as waiting on author for the time being. -- Michael
Attachment
"Zidenberg, Tsahi" <tsahee@amazon.com> writes: > Outline-atomics is a gcc compilation flag that adds runtime detection of weather or not the cpu supports atomic instructions.CPUs that don't support atomic instructions will use the old load-exclusive/store-exclusive instructions. Ifa different compilation flag defined an architecture that unconditionally supports atomic instructions (e.g. -march=armv8.2),the outline-atomic flag will have no effect. I wonder what version of gcc you intend this for. AFAICS, older gcc versions lack this flag at all, while newer ones have it on by default. Docs I can find on the net suggest that it would only help to supply the flag when using gcc 10.0.x. Is there a sufficient population of production systems using such gcc releases to make it worth expending configure cycles on? (That's sort of a trick question, because the GCC docs make it look like 10.0.x was never considered to be production ready.) regards, tom lane
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64
From
"Zidenberg, Tsahi"
Date:
On 07/09/2020, 6:11, "Michael Paquier" <michael@paquier.xyz> wrote: > Interesting stuff. ARM-related otimizations is not something you see > a lot around here. Let's hope that will change :) > could you provide a rebase? I am marking the patch as waiting on > author for the time being. Of course. Attached. -- Thank you! Tsahi.
Attachment
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64
From
"Zidenberg, Tsahi"
Date:
On 08/09/2020, 1:01, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > I wonder what version of gcc you intend this for. AFAICS, older > gcc versions lack this flag at all, while newer ones have it on > by default. (previously sent private reply, sorry) The moutline-atomics flag showed substantial enough improvements that it has been backported to GCC 9, 8 and there is a gcc-7 branch in the works. Ubuntu has integrated this in 20.04, Amazon Linux 2 supports it, with other distributions including Ubuntu 18.04 and Debian on the way. all distributions, including the upcoming Ubuntu with GCC-10, have moutline-atomics turned off by default.
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64
From
Heikki Linnakangas
Date:
On 10/09/2020 09:37, Zidenberg, Tsahi wrote: > On 08/09/2020, 1:01, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > > > I wonder what version of gcc you intend this for. AFAICS, older > > gcc versions lack this flag at all, while newer ones have it on > > by default. > > > (previously sent private reply, sorry) > > The moutline-atomics flag showed substantial enough improvements > that it has been backported to GCC 9, 8 and there is a gcc-7 branch in > the works. > Ubuntu has integrated this in 20.04, Amazon Linux 2 supports it, > with other distributions including Ubuntu 18.04 and Debian on the way. > all distributions, including the upcoming Ubuntu with GCC-10, have > moutline-atomics turned off by default. If it's a good idea to use -moutline-atomics, I would expect the compiler or distribution to enable it by default. And as you pointed out, many have. For the others, there are probably reasons they haven't, like begin conservative in general. Whatever the reasons, IMHO we should not second-guess them. I'm marking this as Rejected in the commitfest. But thanks for the benchmarking, that is valuable information nevertheless. - Heikki
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64
From
"Zidenberg, Tsahi"
Date:
On 29/09/2020, 10:21, "Heikki Linnakangas" <hlinnaka@iki.fi> wrote: > If it's a good idea to use -moutline-atomics, I would expect the > compiler or distribution to enable it by default. And as you pointed > out, many have. -moutline-atomics is only enabled by default on the gcc-10 branch where it was originally developed. It was important enough to be backported to previous versions and picked up by e.g. ubuntu and amazon-linux. However, outline-atomics is not enabled by default in any backports that I'm aware of. Ubuntu 20.04 even turned it off by default for gcc-10, which seems like a compatibility step with the main gcc-9 compiler. Always-enabled outline-atomic is, sadly, many years in the future for release systems. > For the others, there are probably reasons they haven't, > like begin conservative in general. Whatever the reasons, IMHO we should > not second-guess them. I assume GCC chose conservatively not to add code by default that won't help old CPUs when increasing minor versions (although I see no performance degradation in real software). On the other hand, the feature was important enough to be back-ported to allow software to take advantage of it. Postgresql should be the most advanced open source database. As I understand it, it should be able to handle as well as possible large workloads on large modern machines like Graviton2, and that means using LSE. > I'm marking this as Rejected in the commitfest. But thanks for the > benchmarking, that is valuable information nevertheless. Could additional data change your mind?
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64
From
Heikki Linnakangas
Date:
On 30/09/2020 19:04, Zidenberg, Tsahi wrote: > Ubuntu 20.04 even turned it off by default for gcc-10, which seems > like a compatibility step with the main gcc-9 compiler. Ok, I definitely don't want to override that decision. >> I'm marking this as Rejected in the commitfest. But thanks for the >> benchmarking, that is valuable information nevertheless. > > Could additional data change your mind? I doubt it. IMO we shouldn't second-guess decisions made by compiler and distribution vendors, and more performance data won't change that principle. For comparison, we also don't set -O or -funroll-loops or any other flags to enable/disable specific optimizations. (Except for a few specific source files, like checksum.c, but those are exceptions the rule.) If some other committer thinks differently, I won't object, but I'm not going to commit this. Maybe you should speak to the distribution vendors or the folk packaging PostgreSQL for those distributions, instead. - Heikki
"Zidenberg, Tsahi" <tsahee@amazon.com> writes: > On 29/09/2020, 10:21, "Heikki Linnakangas" <hlinnaka@iki.fi> wrote: >>>>>> If it's a good idea to use -moutline-atomics, I would expect the >>>>>> compiler or distribution to enable it by default. And as you pointed >>>>>> out, many have. > -moutline-atomics is only enabled by default on the gcc-10 branch where > it was originally developed. It was important enough to be backported > to previous versions and picked up by e.g. ubuntu and amazon-linux. > However, outline-atomics is not enabled by default in any backports that > I'm aware of. Ubuntu 20.04 even turned it off by default for gcc-10, > which seems like a compatibility step with the main gcc-9 compiler. > Always-enabled outline-atomic is, sadly, many years in the > future for release systems. I don't find this argument terribly convincing. I agree that it'll be a couple years before gcc 10 is in use in "stable production" systems. But it seems to me that big-iron aarch64 is also some way off from appearing in stable production systems. By the time this actually matters to any measurable fraction of our users, distros will have converged on reasonable default settings for this option. In the meantime, you are asking that we more or less permanently expend configure cycles on checking for an option that seems to have a pretty short expected useful life, even on the small minority of builds where it'll do anything at all. The cost/benefit ratio doesn't seem very attractive. None of this prevents somebody from applying the switch in their own builds, of course. But I concur with Heikki's reasoning that it's probably not a great idea for us to, by default, override the distro's default on this. regards, tom lane
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64
From
Alexander Korotkov
Date:
Hi! On Mon, Sep 7, 2020 at 1:12 AM Zidenberg, Tsahi <tsahee@amazon.com> wrote: > First, I apologize for taking so long to answer. This e-mail regretfully got lost in my inbox. > > On 24/07/2020, 4:17, "Andres Freund" <andres@anarazel.de> wrote: > > > What does "not significantly affected" exactly mean? Could you post the > > raw numbers? > > The following tests show benchmark behavior on m6g.8xl instance (32-core with LSE support) > and a1.4xlarge (16-core, no LSE support) with and without the patch, based on postgresql 12.4. > Tests are pgbench select-only/simple-update, and sysbench read-only/write only. > > . select-only. simple-update. read-only. write-only > m6g.8xlarge/vanila. 482130. 56275. 273327. 33364 > m6g.8xlarge/patch. 493748. 59681. 262702. 33024 > a1.4xlarge/vanila. 82437. 13978. 62489. 2928 > a1.4xlarge/patch. 79499. 13932. 62796. 2945 > > Results obviously change with OS / parameters /etc. I have attempted ensure a fair comparison, > But I don't think these numbers should be taken as absolute. > As reference points, m6g instance compiled with -march=native flag, and m5g (x86) instances: > > m6g.8xlarge/native. 522771. 60354. 261366. 33582 > m5.8xlarge. 362908. 58732. 147730. 32750 I'd like to resurrect this thread, if there is still interest from your side. What number of clients and jobs did you use with pgbench? I've noticed far more dramatic effect on high number of clients. Could you verify that? https://akorotkov.github.io/blog/2021/04/30/arm/ ------ Regards, Alexander Korotkov
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64
From
Thomas Munro
Date:
Hi, FYI, people interested in this thread might be interested in pgsql-bugs #18610. There are two related issues here: 1. Some people want to use LSE on modern ARM servers so they want to use -moutline-atomics, which IIUC adds auto-detection logic with fallback code so it can still run on the first generation of aarch64 ARMv8 (without .1) hardware. That was being discussed as a feature proposal for master. (People could already access that if they compile from source by using -march=something_modern, but the big distributions are in charge of what they target and AFAIK mostly still choose ARMv8, so this outline atomics idea is a nice workaround to make everyone happy, I haven't studied exactly how it works.) 2. Clang has started assuming -moutline-atomics in some version, so it's already compiling .bc files that way, so it breaks if our JIT system decides to inline SQL-callable functions, so we'll need to decide what to do about that and back-patch something. Conservative choice would be to stop it from doing that with -mno-outline-atomics, until this thread makes progress, but perhaps people closer to the subject have another idea... [1] https://www.postgresql.org/message-id/flat/18610-37bf303f904fede3%40postgresql.org