On Nov 20 2025, at 7:07 pm, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2025-11-20 19:03:47 -0500, Andres Freund wrote:
>> > MSVC's _InterlockedCompareExchange() intrinsic on ARM64 performs the
>> > atomic operation but does NOT emit the necessary Data Memory Barrier
>> > (DMB) instructions [4][5].
>>
>> I couldn't reproduce this result when playing around on godbolt. By specifying
>> /arch:armv9.4 msvc can be convinced to emit the code for the
>> intrinsics inline
>> (at least for most of them). And that makes it visible that
>> _InterlockedCompareExchange() results in a "casal" instruction.
>> Looking that
>> up shows:
>>
https://developer.arm.com/documentation/dui0801/l/A64-Data-Transfer-Instructions/CASA--CASAL--CAS--CASL--CASAL--CAS--CASL--A64-
>> which includes these two statements:
>> "CASA and CASAL load from memory with acquire semantics."
>> "CASL and CASAL store to memory with release semantics."
>
> Further evidence for that is that
> https://learn.microsoft.com/en-us/windows/win32/api/winnt/nf-winnt-interlockedcompareexchange
> states:
> "This function generates a full memory barrier (or fence) to ensure
> that memory operations are completed in order."
>
> (note that we are using the function, not the intrinsic for TAS())
Got it, thanks.
> Greetings,
>
> Andres
best.
-greg