Re: [PATCH] Fix ARM64/MSVC atomic memory ordering issues on Win11 by adding explicit DMB ​barriers - Mailing list pgsql-hackers

From Greg Burd
Subject Re: [PATCH] Fix ARM64/MSVC atomic memory ordering issues on Win11 by adding explicit DMB ​barriers
Date
Msg-id 1D2B3555-359B-47A2-B291-B23E8F4CAF39@greg.burd.me
Whole thread Raw
In response to Re: [PATCH] Fix ARM64/MSVC atomic memory ordering issues on Win11 by adding explicit DMB ​barriers  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Nov 20 2025, at 7:03 pm, Andres Freund <andres@anarazel.de> wrote:

> Hi,
> 
> On 2025-11-20 15:45:22 -0500, Greg Burd wrote:
>> Dave and I have been working together to get ARM64 with MSVC functional.
>>  The attached patches accomplish that. Dave is the author of the first
>> which addresses some build issues and fixes the spin_delay() semantics,
>> I did the second which fixes some atomics in this combination.
> 
> Thanks for working on this!

You're welcome, thanks for reviewing it. :)

>> 
>> MSVC's _InterlockedCompareExchange() intrinsic on ARM64 performs the
>> atomic operation but does NOT emit the necessary Data Memory Barrier
>> (DMB) instructions [4][5].
> 
> I couldn't reproduce this result when playing around on godbolt. By specifying
> /arch:armv9.4 msvc can be convinced to emit the code for the
> intrinsics inline
> (at least for most of them).  And that makes it visible that
> _InterlockedCompareExchange() results in a "casal" instruction.
> Looking that
> up shows:
>
https://developer.arm.com/documentation/dui0801/l/A64-Data-Transfer-Instructions/CASA--CASAL--CAS--CASL--CASAL--CAS--CASL--A64-
> which includes these two statements:
> "CASA and CASAL load from memory with acquire semantics."
> "CASL and CASAL store to memory with release semantics."

I didn't even think to check for a compiler flag for the architecture,
nice call!  If this emits the correct instructions it is a much better
approach.  I'll give it a try, thanks for the nudge.

>> Issue 2: S_UNLOCK() uses only a compiler barrier
>> 
>> _ReadWriteBarrier() is a compiler barrier, NOT a hardware memory
>> barrier [6].  It prevents the compiler from reordering operations, but
>> the CPU can still reorder memory operations. This is fundamentally
>> insufficient for ARM64's weaker memory model.
> 
> Yea, that seems broken on a non-TSO architecture.  Is the problem
> fixed if you change just this to include a proper barrier?

Using the flag from above the _ReadWriteBarrier() does (in godbolt) turn
into a casal which (AFAIK) is going to do the trick.  I'll see if I can
update meson.build and get this work as intended.

> Greetings,
> 
> Andres Freund

best.

-greg



pgsql-hackers by date:

Previous
From: Nico Williams
Date:
Subject: Re: RFC 9266: Channel Bindings for TLS 1.3 support
Next
From: Greg Burd
Date:
Subject: Re: [PATCH] Fix ARM64/MSVC atomic memory ordering issues on Win11 by adding explicit DMB ​barriers