Thread: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

[PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

From
"Zidenberg, Tsahi"
Date:
Outline-atomics is a gcc compilation flag that adds runtime detection of weather or not the cpu supports atomic
instructions.CPUs that don't support atomic instructions will use the old load-exclusive/store-exclusive instructions.
Ifa different compilation flag defined an architecture that unconditionally supports atomic instructions (e.g.
-march=armv8.2),the outline-atomic flag will have no effect.
 

The patch was tested to improve pgbench simple-update by 10% and sysbench write-only by 3% on a 64-core armv8.2 machine
(AWSm6g.16xlarge). Select-only and read-only benchmarks were not significantly affected, and neither was performance on
a16-core armv8.0 machine that does not support atomic instructions (AWS a1.4xlarge).
 

The patch uses an existing configure.in macro to detect compiler support of the flag. Checking for aarch64 machine is
notstrictly necessary, but was added for readability.
 

Thank you!
Tsahi




Attachment

Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

From
"Zidenberg, Tsahi"
Date:
On 01/07/2020, 18:40, "Zidenberg, Tsahi" <tsahee@amazon.com> wrote:

> Outline-atomics is a gcc compilation flag that adds runtime detection of weather or not the cpu
> supports atomic instructions. CPUs that don't support atomic instructions will use the old 
> load-exclusive/store-exclusive instructions. If a different compilation flag defined an architecture
> that unconditionally supports atomic instructions (e.g. -march=armv8.2), the outline-atomic flag
> will have no effect.
>
> The patch was tested to improve pgbench simple-update by 10% and sysbench write-only by 3%
> on a 64-core armv8.2 machine (AWS m6g.16xlarge). Select-only and read-only benchmarks were
> not significantly affected, and neither was performance on a 16-core armv8.0 machine that does
> not support atomic instructions (AWS a1.4xlarge).
>
> The patch uses an existing configure.in macro to detect compiler support of the flag. Checking for
> aarch64 machine is not strictly necessary, but was added for readability.

Added a commitfest entry:
https://commitfest.postgresql.org/29/2637/

Thank you!
Tsahi





Hi,

On 2020-07-01 15:40:38 +0000, Zidenberg, Tsahi wrote:
> Outline-atomics is a gcc compilation flag that adds runtime detection
> of weather or not the cpu supports atomic instructions. CPUs that
> don't support atomic instructions will use the old
> load-exclusive/store-exclusive instructions. If a different
> compilation flag defined an architecture that unconditionally supports
> atomic instructions (e.g. -march=armv8.2), the outline-atomic flag
> will have no effect.

Sounds attractive.


> The patch was tested to improve pgbench simple-update by 10% and
> sysbench write-only by 3% on a 64-core armv8.2 machine (AWS
> m6g.16xlarge). Select-only and read-only benchmarks were not
> significantly affected, and neither was performance on a 16-core
> armv8.0 machine that does not support atomic instructions (AWS
> a1.4xlarge).

What does "not significantly affected" exactly mean? Could you post the
raw numbers?  I'm a bit concerned that the additional conditional
branches on platforms without non ll/sc atomics could hurt noticably.

I'm surprised that read-only didn't benefit - with ll/sc that ought to
have pretty high contention on a few lwlocks.

Could you post the numbers?

Greetings,

Andres Freund



Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

From
"Zidenberg, Tsahi"
Date:
Hello!

First, I apologize for taking so long to answer. This e-mail regretfully got lost in my inbox.

On 24/07/2020, 4:17, "Andres Freund" <andres@anarazel.de> wrote:

    > What does "not significantly affected" exactly mean? Could you post the
    > raw numbers?

The following tests show benchmark behavior on m6g.8xl instance (32-core with LSE support)
and a1.4xlarge (16-core, no LSE support) with and without the patch, based on postgresql 12.4.
Tests are pgbench select-only/simple-update, and sysbench read-only/write only.

.                      select-only.     simple-update.    read-only.           write-only
m6g.8xlarge/vanila.      482130.         56275.              273327.               33364
m6g.8xlarge/patch.       493748.         59681.              262702.               33024
a1.4xlarge/vanila.        82437.         13978.               62489.                2928
a1.4xlarge/patch.         79499.         13932.               62796.                2945

Results obviously change with OS / parameters /etc. I have attempted ensure a fair comparison,
But I don't think these numbers should be taken as absolute.
As reference points, m6g instance compiled with -march=native flag, and m5g (x86) instances:

m6g.8xlarge/native.       522771.        60354.               261366.              33582
m5.8xlarge.               362908.        58732.               147730.              32750

    > I'm a bit concerned that the additional conditional
    > branches on platforms without non ll/sc atomics could hurt noticably.

As can be seen in a1 results - the difference for CPUSs with no LSE atomic support is low.
Locks have one branch added, which is always taken the same way and thus easy to predict.

    > I'm surprised that read-only didn't benefit - with ll/sc that ought to
    > have pretty high contention on a few lwlocks.

These results show only about 6% performance increase in simple-update, and very close
performance in other results, most of which could be attributed to benchmark result jitter.
These results from "well behaved" benchmarks do not show the full importance of using 
outline-atomics. I have observed in some experiments with other values and larger systems
a crush of performance including read-only tests, which was caused by continuously failing to
commit strx instructions. In such cases, outline-atomics improved performance by more
than 2x factor. These cases are not always easy to replicate.

Thank you!
and sorry again for the delay
Tsahi Zidenberg


Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

From
Michael Paquier
Date:
On Sun, Sep 06, 2020 at 09:00:02PM +0000, Zidenberg, Tsahi wrote:
> These results show only about 6% performance increase in simple-update, and very close
> performance in other results, most of which could be attributed to benchmark result jitter.
> These results from "well behaved" benchmarks do not show the full importance of using
> outline-atomics. I have observed in some experiments with other values and larger systems
> a crush of performance including read-only tests, which was caused by continuously failing to
> commit strx instructions. In such cases, outline-atomics improved performance by more
> than 2x factor. These cases are not always easy to replicate.

Interesting stuff.  ARM-related otimizations is not something you see
a lot around here.  Please note that your latest patch fails to apply,
could you provide a rebase?  I am marking the patch as waiting on
author for the time being.
--
Michael

Attachment
"Zidenberg, Tsahi" <tsahee@amazon.com> writes:
> Outline-atomics is a gcc compilation flag that adds runtime detection of weather or not the cpu supports atomic
instructions.CPUs that don't support atomic instructions will use the old load-exclusive/store-exclusive instructions.
Ifa different compilation flag defined an architecture that unconditionally supports atomic instructions (e.g.
-march=armv8.2),the outline-atomic flag will have no effect. 

I wonder what version of gcc you intend this for.  AFAICS, older
gcc versions lack this flag at all, while newer ones have it on
by default.  Docs I can find on the net suggest that it would only
help to supply the flag when using gcc 10.0.x.  Is there a sufficient
population of production systems using such gcc releases to make it
worth expending configure cycles on?  (That's sort of a trick question,
because the GCC docs make it look like 10.0.x was never considered
to be production ready.)

            regards, tom lane



Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

From
"Zidenberg, Tsahi"
Date:
  On 07/09/2020, 6:11, "Michael Paquier" <michael@paquier.xyz> wrote:
    > Interesting stuff.  ARM-related otimizations is not something you see
    > a lot around here.

Let's hope that will change :)

   >  could you provide a rebase?  I am marking the patch as waiting on
   >  author for the time being.

Of course. Attached.

--
Thank you!
Tsahi.


Attachment

Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

From
"Zidenberg, Tsahi"
Date:
On 08/09/2020, 1:01, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

    > I wonder what version of gcc you intend this for.  AFAICS, older
    > gcc versions lack this flag at all, while newer ones have it on
    > by default.


(previously sent private reply, sorry)

The moutline-atomics flag showed substantial enough improvements
that it has been backported to GCC 9, 8 and there is a gcc-7 branch in
the works.
Ubuntu has integrated this in 20.04, Amazon Linux 2 supports it,
with other distributions including Ubuntu 18.04 and Debian on the way.
all distributions, including the upcoming Ubuntu with GCC-10, have
moutline-atomics turned off by default.


Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

From
Heikki Linnakangas
Date:
On 10/09/2020 09:37, Zidenberg, Tsahi wrote:
> On 08/09/2020, 1:01, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> 
>      > I wonder what version of gcc you intend this for.  AFAICS, older
>      > gcc versions lack this flag at all, while newer ones have it on
>      > by default.
> 
> 
> (previously sent private reply, sorry)
> 
> The moutline-atomics flag showed substantial enough improvements
> that it has been backported to GCC 9, 8 and there is a gcc-7 branch in
> the works.
> Ubuntu has integrated this in 20.04, Amazon Linux 2 supports it,
> with other distributions including Ubuntu 18.04 and Debian on the way.
> all distributions, including the upcoming Ubuntu with GCC-10, have
> moutline-atomics turned off by default.

If it's a good idea to use -moutline-atomics, I would expect the 
compiler or distribution to enable it by default. And as you pointed 
out, many have. For the others, there are probably reasons they haven't, 
like begin conservative in general. Whatever the reasons, IMHO we should 
not second-guess them.

I'm marking this as Rejected in the commitfest. But thanks for the 
benchmarking, that is valuable information nevertheless.

- Heikki



Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

From
"Zidenberg, Tsahi"
Date:

On 29/09/2020, 10:21, "Heikki Linnakangas" <hlinnaka@iki.fi> wrote:
    > If it's a good idea to use -moutline-atomics, I would expect the
    > compiler or distribution to enable it by default. And as you pointed
    > out, many have.

-moutline-atomics is only enabled by default on the gcc-10 branch where
it was originally developed. It was important enough to be backported
to previous versions and picked up by e.g. ubuntu and amazon-linux.
However, outline-atomics is not enabled by default in any backports that
I'm aware of. Ubuntu 20.04 even turned it off by default for gcc-10,
which seems like a compatibility step with the main gcc-9 compiler.
Always-enabled outline-atomic is, sadly, many years in the
future for release systems.

    > For the others, there are probably reasons they haven't,
    > like begin conservative in general. Whatever the reasons, IMHO we should
    > not second-guess them.

I assume GCC chose conservatively not to add code by default that
won't help old CPUs when increasing minor versions (although I see
no performance degradation in real software).
On the other hand, the feature was important enough to be
back-ported to allow software to take advantage of it.
Postgresql should be the most advanced open source database.
As I understand it, it should be able to handle as well as possible
large workloads on large modern machines like Graviton2, and
that means using LSE.

    > I'm marking this as Rejected in the commitfest. But thanks for the
    > benchmarking, that is valuable information nevertheless.

Could additional data change your mind?




Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

From
Heikki Linnakangas
Date:
On 30/09/2020 19:04, Zidenberg, Tsahi wrote:
> Ubuntu 20.04 even turned it off by default for gcc-10, which seems
> like a compatibility step with the main gcc-9 compiler.
Ok, I definitely don't want to override that decision.

>> I'm marking this as Rejected in the commitfest. But thanks for the
>> benchmarking, that is valuable information nevertheless.
> 
> Could additional data change your mind?

I doubt it. IMO we shouldn't second-guess decisions made by compiler and 
distribution vendors, and more performance data won't change that 
principle. For comparison, we also don't set -O or -funroll-loops or any 
other flags to enable/disable specific optimizations. (Except for a few 
specific source files, like checksum.c, but those are exceptions the rule.)

If some other committer thinks differently, I won't object, but I'm not 
going to commit this. Maybe you should speak to the distribution vendors 
or the folk packaging PostgreSQL for those distributions, instead.

- Heikki



"Zidenberg, Tsahi" <tsahee@amazon.com> writes:
> On 29/09/2020, 10:21, "Heikki Linnakangas" <hlinnaka@iki.fi> wrote:
>>>>>> If it's a good idea to use -moutline-atomics, I would expect the
>>>>>> compiler or distribution to enable it by default. And as you pointed
>>>>>> out, many have.

> -moutline-atomics is only enabled by default on the gcc-10 branch where
> it was originally developed. It was important enough to be backported
> to previous versions and picked up by e.g. ubuntu and amazon-linux.
> However, outline-atomics is not enabled by default in any backports that
> I'm aware of. Ubuntu 20.04 even turned it off by default for gcc-10,
> which seems like a compatibility step with the main gcc-9 compiler.
> Always-enabled outline-atomic is, sadly, many years in the
> future for release systems.

I don't find this argument terribly convincing.  I agree that it'll be
a couple years before gcc 10 is in use in "stable production" systems.
But it seems to me that big-iron aarch64 is also some way off from
appearing in stable production systems.  By the time this actually
matters to any measurable fraction of our users, distros will have
converged on reasonable default settings for this option.

In the meantime, you are asking that we more or less permanently expend
configure cycles on checking for an option that seems to have a pretty
short expected useful life, even on the small minority of builds where
it'll do anything at all.  The cost/benefit ratio doesn't seem very
attractive.

None of this prevents somebody from applying the switch in their own
builds, of course.  But I concur with Heikki's reasoning that it's
probably not a great idea for us to, by default, override the distro's
default on this.

            regards, tom lane



Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

From
Alexander Korotkov
Date:
Hi!

On Mon, Sep 7, 2020 at 1:12 AM Zidenberg, Tsahi <tsahee@amazon.com> wrote:
> First, I apologize for taking so long to answer. This e-mail regretfully got lost in my inbox.
>
> On 24/07/2020, 4:17, "Andres Freund" <andres@anarazel.de> wrote:
>
>     > What does "not significantly affected" exactly mean? Could you post the
>     > raw numbers?
>
> The following tests show benchmark behavior on m6g.8xl instance (32-core with LSE support)
> and a1.4xlarge (16-core, no LSE support) with and without the patch, based on postgresql 12.4.
> Tests are pgbench select-only/simple-update, and sysbench read-only/write only.
>
> .                      select-only.     simple-update.    read-only.           write-only
> m6g.8xlarge/vanila.      482130.         56275.              273327.               33364
> m6g.8xlarge/patch.       493748.         59681.              262702.               33024
> a1.4xlarge/vanila.        82437.         13978.               62489.                2928
> a1.4xlarge/patch.         79499.         13932.               62796.                2945
>
> Results obviously change with OS / parameters /etc. I have attempted ensure a fair comparison,
> But I don't think these numbers should be taken as absolute.
> As reference points, m6g instance compiled with -march=native flag, and m5g (x86) instances:
>
> m6g.8xlarge/native.       522771.        60354.               261366.              33582
> m5.8xlarge.               362908.        58732.               147730.              32750

I'd like to resurrect this thread, if there is still interest from
your side.  What number of clients and jobs did you use with pgbench?

I've noticed far more dramatic effect on high number of clients.
Could you verify that?
https://akorotkov.github.io/blog/2021/04/30/arm/

------
Regards,
Alexander Korotkov



Hi,

FYI, people interested in this thread might be interested in
pgsql-bugs #18610.  There are two related issues here:

1.  Some people want to use LSE on modern ARM servers so they want to
use -moutline-atomics, which IIUC adds auto-detection logic with
fallback code so it can still run on the first generation of aarch64
ARMv8 (without .1) hardware.  That was being discussed as a feature
proposal for master.  (People could already access that if they
compile from source by using -march=something_modern, but the big
distributions are in charge of what they target and AFAIK mostly still
choose ARMv8, so this outline atomics idea is a nice workaround to
make everyone happy, I haven't studied exactly how it works.)

2.  Clang has started assuming -moutline-atomics in some version, so
it's already compiling .bc files that way, so it breaks if our JIT
system decides to inline SQL-callable functions, so we'll need to
decide what to do about that and back-patch something.  Conservative
choice would be to stop it from doing that with -mno-outline-atomics,
until this thread makes progress, but perhaps people closer to the
subject have another idea...

[1] https://www.postgresql.org/message-id/flat/18610-37bf303f904fede3%40postgresql.org