Re: [PATCH] Fix ARM64/MSVC atomic memory ordering issues on Win11 by adding explicit DMB ​barriers - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [PATCH] Fix ARM64/MSVC atomic memory ordering issues on Win11 by adding explicit DMB ​barriers
Date
Msg-id eja4ly2wd3kpsjmumx3qhqqttxsxk3fqmubyuqe4ge2wkfmzrv@4zbjq3lk27rh
Whole thread Raw
In response to [PATCH] Fix ARM64/MSVC atomic memory ordering issues on Win11 by adding explicit DMB ​barriers  (Greg Burd <greg@burd.me>)
Responses Re: [PATCH] Fix ARM64/MSVC atomic memory ordering issues on Win11 by adding explicit DMB ​barriers
List pgsql-hackers
Hi,

On 2025-11-20 15:45:22 -0500, Greg Burd wrote:
> Dave and I have been working together to get ARM64 with MSVC functional.
>  The attached patches accomplish that. Dave is the author of the first
> which addresses some build issues and fixes the spin_delay() semantics,
> I did the second which fixes some atomics in this combination.

Thanks for working on this!


> This pointed a finger at the atomics, so I started there.  We used a few
> tools, but worth noting is https://godbolt.org/ where we were able to
> quickly see that the MSVC assembly was missing the "dmb" barriers on
> this platform.  I'm not sure how long this link will be valid, but in
> the short term here's our investigation: https://godbolt.org/z/PPqfxe1bn
> 
> 
> PROBLEM DESCRIPTION
> 
> PostgreSQL test failures occur intermittently on MSVC ARM64 builds,
> manifesting as timing-dependent failures in critical sections
> protected by spinlocks and atomic variables. The failures are
> reproducible when the test suite is compiled with optimization flags
> (/O2), particularly in the recovery/027_stream_regress test which
> involves WAL replication and standby recovery.
> 
> The root cause has two components:
> 
> 1. Atomic operations lack memory barriers on ARM64
> 2. MSVC spinlock implementation lacks memory barriers on ARM64
> 
> TECHNICAL ANALYSIS
> 
> PART 1: ATOMIC OPERATIONS MEMORY BARRIERS
> 
> GCC's __atomic_compare_exchange_n() with __ATOMIC_SEQ_CST semantics
> generates a call to __aarch64_cas4_acq_rel(), which is a library
> function that provides explicit acquire-release memory ordering
> semantics through either:
> 
> * LSE path (modern ARM64): Using CASAL instruction with built-in
>   memory ordering [1][2]
> 
> * Legacy path (older ARM64): Using LDAXR/STLXR instructions with
>   explicit dmb sy instruction [3]
> 
> MSVC's _InterlockedCompareExchange() intrinsic on ARM64 performs the
> atomic operation but does NOT emit the necessary Data Memory Barrier
> (DMB) instructions [4][5].

I couldn't reproduce this result when playing around on godbolt. By specifying
/arch:armv9.4 msvc can be convinced to emit the code for the intrinsics inline
(at least for most of them).  And that makes it visible that
_InterlockedCompareExchange() results in a "casal" instruction. Looking that
up shows:

https://developer.arm.com/documentation/dui0801/l/A64-Data-Transfer-Instructions/CASA--CASAL--CAS--CASL--CASAL--CAS--CASL--A64-
which includes these two statements:
"CASA and CASAL load from memory with acquire semantics."
"CASL and CASAL store to memory with release semantics."


> Issue 2: S_UNLOCK() uses only a compiler barrier
> 
> _ReadWriteBarrier() is a compiler barrier, NOT a hardware memory
> barrier [6].  It prevents the compiler from reordering operations, but
> the CPU can still reorder memory operations. This is fundamentally
> insufficient for ARM64's weaker memory model.

Yea, that seems broken on a non-TSO architecture.  Is the problem fixed if you
change just this to include a proper barrier?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Bernice Southey
Date:
Subject: [PATCH] Remove ctid from self-join examples in UPDATE and DELETE docs
Next
From: Andres Freund
Date:
Subject: Re: [PATCH] Fix ARM64/MSVC atomic memory ordering issues on Win11 by adding explicit DMB ​barriers