pg_atomic_compare_exchange_*() and memory barriers - Mailing list pgsql-hackers

From Alexander Korotkov
Subject pg_atomic_compare_exchange_*() and memory barriers
Date
Msg-id CAPpHfdudejddHLLw=Uk_7pZ=1g2SZXfwubSgmWUhJ7fBUtR9rg@mail.gmail.com
Whole thread Raw
Responses Re: pg_atomic_compare_exchange_*() and memory barriers
List pgsql-hackers
Hi all,

While investigating a bug in the patch to get rid of WALBufMappingLock, I found that the surrounding pg_atomic_compare_exchange_u64() fixes the problem for me.  That doesn't feel right because, according to the comments, both pg_atomic_compare_exchange_u32() and pg_atomic_compare_exchange_u64() should provide full memory barrier semantics.  So, I decided to investigate this further.

In my case, the implementation of pg_atomic_compare_exchange_u64() is based on __atomic_compare_exchange_n().

static inline bool
pg_atomic_compare_exchange_u64_impl(volatile pg_atomic_uint64 *ptr,
                                    uint64 *expected, uint64 newval)
{
    AssertPointerAlignment(expected, 8);
    return __atomic_compare_exchange_n(&ptr->value, expected, newval, false,
                                       __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
}

According to the docs __ATOMIC_SEQ_CST enforces total ordering with all other __ATOMIC_SEQ_CST operations.  It only says about other __ATOMIC_SEQ_CST operations; nothing is said about regular reads/writes.  This sounds quite far from our comment, promising full barrier semantics, which, in my understanding, means the equivalent of both pg_read_barrier()/pg_write_barrier(), which should barrier all other read/writes.

I've found that __atomic_compare_exchange_n() ends up with usage of __aarch64_cas8_acq_rel, which disassembles as following.

0000000000000860 <__aarch64_cas8_acq_rel>:
 860:   d503245f    bti c
 864:   90000110    adrp    x16, 20000 <__data_start>
 868:   3940a210    ldrb    w16, [x16, #40]
 86c:   34000070    cbz w16, 878 <__aarch64_cas8_acq_rel+0x18>
 870:   c8e0fc41    casal   x0, x1, [x2]
 874:   d65f03c0    ret
 878:   aa0003f0    mov x16, x0
 87c:   c85ffc40    ldaxr   x0, [x2]
 880:   eb10001f    cmp x0, x16
 884:   54000061    b.ne    890 <__aarch64_cas8_acq_rel+0x30>  // b.any
 888:   c811fc41    stlxr   w17, x1, [x2]
 88c:   35ffff91    cbnz    w17, 87c <__aarch64_cas8_acq_rel+0x1c>
 890:   d65f03c0    ret

This function first checks if LSE instructions are present.  If so, instruction LSE casal is used.  If not (in my case), the loop of ldaxr/stlxr is used instead.  The documentation says both ldaxr/stlxr provides one-way barriers.  Read/writes after ldaxr will be observed after, and read/writes before stlxr will be observed before.  So, a pair of these instructions provides a full memory barrier.  However, if we don't observe the expected value, only ldaxr gets executed.  So, an unsuccessful pg_atomic_compare_exchange_u64() attempt will leave us with a one-way barrier, and that caused a bug in my case.  In contrast, __sync_val_compare_and_swap() uses __aarch64_cas8_sync, which is quite the same as __aarch64_cas8_acq_rel, but has an explicit memory barrier in the end.

I have a couple of ideas on how to fix this problem.
1. Provide full barrier semantics for pg_atomic_compare_exchange_*().  In particular, for __atomic_compare_exchange_n(), we should probably use __ATOMIC_ACQ_REL for success and run an explicit memory barrier in the case of failure.
2. Document that pg_atomic_compare_exchange_*() doesn't necessarily provide a memory barrier on failure.  But then we need to carefully check if existing use-cases need explicit memory barriers now.

Any thoughts?

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Refactoring postmaster's code to cleanup after child exit
Next
From: Melanie Plageman
Date:
Subject: Re: Trigger more frequent autovacuums