Upon closer inspection, I noticed that we don't implement a custom TAS_SPIN() for this architecture, so I quickly hacked together the attached patch and ran a couple of benchmarks that stressed the spinlock code. I found no discussion about TAS_SPIN() on ARM in the archives, but I did notice that the initial AArch64 support was added [0] before x86_64 started using a non-locking test [1].
It reminds me of a discussion about improving spinlock performance on ARM in 2020 [0], though the discussion is about CAS and TAS, not TAS_SPIN() itself.
The result looks great, but the discussion in [0] shows that the result may
vary among different ARM chips. Could you provide the chip model of this test? So that we can do a cross validation of this patch. Not sure if compiler
version is necessary too. I'm willing to test it on Alibaba Cloud Yitian 710 if I have time.