Re: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture
Date
Msg-id aBPSjaY3Oki0pxHZ@nathan
Whole thread Raw
In response to Re: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture
List pgsql-hackers
On Thu, May 01, 2025 at 02:48:59PM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Apr 30, 2025 at 4:53 AM Salvatore Dipietro
>> <dipietro.salvatore@gmail.com> wrote:
>>> we would like to propose the removal of the Instruction
>>> Synchronization Barrier (isb) for aarch64 architectures. Based on our
>>> testing on Graviton instances (m7g.16xlarge), we can see on average
>>> over multiple iterations up to 12% better performance using PGBench
>>> select-only and up to 9% with Sysbench oltp_read_only workloads. On
>>> Graviton4 (m8g.24xlarge) results are up to 8% better using PGBench
>>> select-only and up to 6% with Sysbench oltp_read_only workloads.
>>> We have also tested it putting more pressure on the spin_delay
>>> function, enabling pg_stat_statements.track_planning with PGBench
>>> read-only [0] and, on average, the patch shows up to 27% better
>>> performance on m6g.16xlarge and up to 37% on m7g.16xlarge.
> 
>> I think you should make some kind of argument about why the previous
>> conclusion was wrong, or why something's changed between then and now.
> 
> TBH, those numbers are large enough that I flat out don't believe
> them.  As noted in the previous thread, we've managed to squeeze out
> a lot of our dependencies on spinlock performance, via algorithmic
> changes, migration to atomic ops, etc.  So I think ten-percent-ish
> improvement on a plain pgbench test just isn't very plausible.
> We certainly didn't see that kind of effect when we were doing that
> earlier round of testing --- we had to use a custom testing lashup
> to get numbers that were outside the noise at all.

I'm not sure there's actually any hot spinlock involved in a regular
select-only pgbench (unless perhaps pg_stat_statements was enabled or
something).  But I do know pg_stat_statements.track_planning stresses the
spinlock code quite a bit, and commit 3d0b4b1 recently added a non-locking
initial test in AArch64's TAS_SPIN, so I wonder if the ISB is still
appropriate.  It'd be interesting to see the performance difference of
removing the ISB with and without commit 3d0b4b1 applied.

-- 
nathan



pgsql-hackers by date:

Previous
From: Dmitry Dolgov
Date:
Subject: Re: queryId constant squashing does not support prepared statements
Next
From: Tom Lane
Date:
Subject: Re: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture