On Sun, Jun 07, 2020 at 12:23:35AM -0400, Tom Lane wrote:
> I experimented with running "make check" on ARM64 under a reasonably
> bleeding-edge valgrind (3.16.0). One thing I ran into is that
> regress.c's test_atomic_ops fails; valgrind shows the stack trace
>
> fun:__aarch64_cas8_acq_rel
> fun:pg_atomic_compare_exchange_u64_impl
> fun:pg_atomic_exchange_u64_impl
> fun:pg_atomic_write_u64_impl
> fun:pg_atomic_init_u64_impl
> fun:pg_atomic_init_u64
> fun:test_atomic_uint64
> fun:test_atomic_ops
> fun:ExecInterpExpr
>
> Now, this is basically the same thing as is already memorialized in
> src/tools/valgrind.supp:
>
> # Atomic writes to 64bit atomic vars uses compare/exchange to
> # guarantee atomic writes of 64bit variables. pg_atomic_write is used
> # during initialization of the atomic variable; that leads to an
> # initial read of the old, undefined, memory value. But that's just to
> # make sure the swap works correctly.
> {
> uninitialized_atomic_init_u64
> Memcheck:Cond
> fun:pg_atomic_exchange_u64_impl
> fun:pg_atomic_write_u64_impl
> fun:pg_atomic_init_u64_impl
> }
>
> so my first thought was that we just needed an architecture-specific
> variant of that. But on thinking more about this, it seems like
> generic.h's version of pg_atomic_init_u64_impl is just fundamentally
> misguided. Why isn't it simply assigning the value with an ordinary
> unlocked write? By definition, we had better not be using this function
> in any circumstance where there might be conflicting accesses
Does something guarantee the write will be globally-visible by the time the
first concurrent accessor shows up? (If not, one could (a) do an unlocked
ptr->value=0, then the atomic write, or (b) revert and improve the
suppression.) I don't doubt it's fine for the ways PostgreSQL uses atomics
today, which generally initialize an atomic before the concurrent-accessor
processes even exist.
> , so I don't
> see why we should need to tolerate a valgrind exception here. Moreover,
> if a simple assignment *isn't* good enough, then surely the spinlock
> version in atomics.c is 100% broken.
Are you saying it would imply a bug in atomics.c pg_atomic_init_u64_impl(),
pg_atomic_compare_exchange_u64_impl(), or pg_atomic_fetch_add_u64_impl()? Can
you explain that more? If you were referring to unlocked "*(lock) = 0", that
is different since it's safe to have a delay in propagation of the change from
locked state to unlocked state.