Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64 - Mailing list pgsql-hackers

From Zidenberg, Tsahi
Subject Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64
Date
Msg-id 1C8D0E58-FB33-4105-AC00-8FA07621F5DD@amazon.com
Whole thread Raw
In response to Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64  (Andres Freund <andres@anarazel.de>)
Responses Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64  (Michael Paquier <michael@paquier.xyz>)
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
Hello!

First, I apologize for taking so long to answer. This e-mail regretfully got lost in my inbox.

On 24/07/2020, 4:17, "Andres Freund" <andres@anarazel.de> wrote:

    > What does "not significantly affected" exactly mean? Could you post the
    > raw numbers?

The following tests show benchmark behavior on m6g.8xl instance (32-core with LSE support)
and a1.4xlarge (16-core, no LSE support) with and without the patch, based on postgresql 12.4.
Tests are pgbench select-only/simple-update, and sysbench read-only/write only.

.                      select-only.     simple-update.    read-only.           write-only
m6g.8xlarge/vanila.      482130.         56275.              273327.               33364
m6g.8xlarge/patch.       493748.         59681.              262702.               33024
a1.4xlarge/vanila.        82437.         13978.               62489.                2928
a1.4xlarge/patch.         79499.         13932.               62796.                2945

Results obviously change with OS / parameters /etc. I have attempted ensure a fair comparison,
But I don't think these numbers should be taken as absolute.
As reference points, m6g instance compiled with -march=native flag, and m5g (x86) instances:

m6g.8xlarge/native.       522771.        60354.               261366.              33582
m5.8xlarge.               362908.        58732.               147730.              32750

    > I'm a bit concerned that the additional conditional
    > branches on platforms without non ll/sc atomics could hurt noticably.

As can be seen in a1 results - the difference for CPUSs with no LSE atomic support is low.
Locks have one branch added, which is always taken the same way and thus easy to predict.

    > I'm surprised that read-only didn't benefit - with ll/sc that ought to
    > have pretty high contention on a few lwlocks.

These results show only about 6% performance increase in simple-update, and very close
performance in other results, most of which could be attributed to benchmark result jitter.
These results from "well behaved" benchmarks do not show the full importance of using 
outline-atomics. I have observed in some experiments with other values and larger systems
a crush of performance including read-only tests, which was caused by continuously failing to
commit strx instructions. In such cases, outline-atomics improved performance by more
than 2x factor. These cases are not always easy to replicate.

Thank you!
and sorry again for the delay
Tsahi Zidenberg


pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: v13: show extended stats target in \d
Next
From: Tomas Vondra
Date:
Subject: Re: Disk-based hash aggregate's cost model