Hi,
On 2025-11-20 19:03:47 -0500, Andres Freund wrote:
> > MSVC's _InterlockedCompareExchange() intrinsic on ARM64 performs the
> > atomic operation but does NOT emit the necessary Data Memory Barrier
> > (DMB) instructions [4][5].
>
> I couldn't reproduce this result when playing around on godbolt. By specifying
> /arch:armv9.4 msvc can be convinced to emit the code for the intrinsics inline
> (at least for most of them). And that makes it visible that
> _InterlockedCompareExchange() results in a "casal" instruction. Looking that
> up shows:
>
https://developer.arm.com/documentation/dui0801/l/A64-Data-Transfer-Instructions/CASA--CASAL--CAS--CASL--CASAL--CAS--CASL--A64-
> which includes these two statements:
> "CASA and CASAL load from memory with acquire semantics."
> "CASL and CASAL store to memory with release semantics."
Further evidence for that is that
https://learn.microsoft.com/en-us/windows/win32/api/winnt/nf-winnt-interlockedcompareexchange
states:
"This function generates a full memory barrier (or fence) to ensure that memory operations are completed in order."
(note that we are using the function, not the intrinsic for TAS())
Greetings,
Andres