Heikki Linnakangas <hlinnaka@iki.fi> writes:
> But let's go back to why we're considering this. The idea was to
> optimize this block:
> ...
> One trick that we could do is to replace that with a 128-bit atomic
> compare-and-swap instruction. Modern 64-bit Intel systems have that,
> it's called CMPXCHG16B. Don't know about other architectures. An atomic
> fetch-and-add, as envisioned in the comment above, would presumably be
> better, but I suspect that a compare-and-swap would be good enough to
> move the bottleneck elsewhere again.
+1 for taking a look at that. A bit of experimentation shows that
recent gcc and clang can generate that instruction using
__sync_bool_compare_and_swap or __sync_val_compare_and_swap
on an __int128 value.
regards, tom lane