Re: pg_atomic_compare_exchange_*() and memory barriers - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: pg_atomic_compare_exchange_*() and memory barriers
Date
Msg-id CAPpHfdvp=_1NRF4YFFp9Oii7mRR5V4b-C2aukbuLQjWMjynYrw@mail.gmail.com
Whole thread Raw
In response to Re: pg_atomic_compare_exchange_*() and memory barriers  (Andres Freund <andres@anarazel.de>)
Responses Re: pg_atomic_compare_exchange_*() and memory barriers
Re: pg_atomic_compare_exchange_*() and memory barriers
List pgsql-hackers
On Fri, Mar 7, 2025 at 7:07 PM Andres Freund <andres@anarazel.de> wrote:
> On 2025-03-07 18:39:42 +0200, Alexander Korotkov wrote:
> > On Fri, Mar 7, 2025 at 6:02 PM Andres Freund <andres@anarazel.de> wrote:
> > >
> > > On 2025-03-07 17:47:08 +0200, Alexander Korotkov wrote:
> > > > While investigating a bug in the patch to get rid of WALBufMappingLock, I
> > > > found that the surrounding pg_atomic_compare_exchange_u64() fixes the
> > > > problem for me.
> > >
> > > That sounds more likely to be due to slowing down things enough to make a race
> > > less likely to be hit. Or a compiler bug.
> > >
> > >
> > > > That doesn't feel right because, according to the
> > > > comments, both pg_atomic_compare_exchange_u32() and
> > > > pg_atomic_compare_exchange_u64() should provide full memory barrier
> > > > semantics.  So, I decided to investigate this further.
> > > >
> > > > In my case, the implementation of pg_atomic_compare_exchange_u64() is based
> > > > on __atomic_compare_exchange_n().
> > > >
> > > > static inline bool
> > > > pg_atomic_compare_exchange_u64_impl(volatile pg_atomic_uint64 *ptr,
> > > >                                     uint64 *expected, uint64 newval)
> > > > {
> > > >     AssertPointerAlignment(expected, 8);
> > > >     return __atomic_compare_exchange_n(&ptr->value, expected, newval, false,
> > > >                                        __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
> > > > }
> > > >
> > > > According to the docs __ATOMIC_SEQ_CST enforces total ordering with *all
> > > > other __ATOMIC_SEQ_CST operations*.  It only says about other
> > > > __ATOMIC_SEQ_CST operations; nothing is said about regular reads/writes.
> > > > This sounds quite far from our comment, promising full barrier semantics,
> > > > which, in my understanding, means the equivalent of
> > > > both pg_read_barrier()/pg_write_barrier(), which should barrier *all other
> > > > read/writes*.
> > >
> > > A memory barrier in one process/thread always needs be paired with a barrier
> > > in another process/thread. It's not enough to have a memory barrier on one
> > > side but not the other.
> >
> > Yes, there surely should be a memory barrier on another side.  But
> > does __ATOMIC_SEQ_CST works as a barrier for the regular read/write
> > operations on the same side?
>
> Yes, if it's paired with another barrier on the other side. The regular
> read/write operations are basically protected transitively, due to
>
> a) An acquire barrier preventing non-atomic reads/writes in the same thread
>    from appearing to have been moved to before the barrier
>
> b) A release barrier preventing non-atomic reads/writes in the same thread
>    from appearing to have been moved to after the barrier
>
> c) The other thread being guaranteed a) and b) for the other threads'
>    non-atomic reads/writes depending on the type of barrier
>
> d) The atomic value itself being guaranteed to be, well, atomic

Yes, looks good to me.

> > > > This function first checks if LSE instructions are present.  If so,
> > > > instruction LSE casal is used.  If not (in my case), the loop of
> > > > ldaxr/stlxr is used instead.  The documentation says both ldaxr/stlxr
> > > > provides one-way barriers.  Read/writes after ldaxr will be observed after,
> > > > and read/writes before stlxr will be observed before.  So, a pair of these
> > > > instructions provides a full memory barrier.  However, if we don't observe
> > > > the expected value, only ldaxr gets executed.  So, an unsuccessful
> > > > pg_atomic_compare_exchange_u64() attempt will leave us with a one-way
> > > > barrier, and that caused a bug in my case.
> > >
> > > That has to be a compiler bug.  We specify __ATOMIC_SEQ_CST for both
> > > success_memorder *and* failure_memorder.
> > >
> > > What compiler & version is this?
> >
> > I've tried and got the same results with two compilers.
> > gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
> > Ubuntu clang version 19.1.1 (1ubuntu1~24.04.2)
>
> Thinking more about it I wonder if the behaviour of not doing a release in
> case the atomic fails *might* arguably actually be correct - if the
> compare-exchange fails, there's nothing that the non-atomic values could be
> ordered against.
>
> What is the access pattern and the observed problems with it that made you
> look at the disassembly?

Check this code.

l1:     pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], NewPageEndPtr);

        /*
         * Try to advance XLogCtl->InitializedUpTo.
         *
         * If the CAS operation failed, then some of previous pages are not
         * initialized yet, and this backend gives up.
         *
         * Since initializer of next page might give up on advancing of
         * InitializedUpTo, this backend have to attempt advancing until it
         * find page "in the past" or concurrent backend succeeded at
         * advancing.  When we finish advancing XLogCtl->InitializedUpTo, we
         * notify all the waiters with XLogCtl->InitializedUpToCondVar.
         */
l2:     while (pg_atomic_compare_exchange_u64(&XLogCtl->InitializedUpTo, &NewPageBeginPtr, NewPageEndPtr))
        {
            NewPageBeginPtr = NewPageEndPtr;
            NewPageEndPtr = NewPageBeginPtr + XLOG_BLCKSZ;
            nextidx = XLogRecPtrToBufIdx(NewPageBeginPtr);

l3:         if (pg_atomic_read_u64(&XLogCtl->xlblocks[nextidx]) != NewPageEndPtr)
            {
                /*
                 * Page at nextidx wasn't initialized yet, so we cann't move
                 * InitializedUpto further. It will be moved by backend which
                 * will initialize nextidx.
                 */
                ConditionVariableBroadcast(&XLogCtl->InitializedUpToCondVar);
                break;
            }
        }

Consider the following execution order with process 1 (p1) and process 2 (p2).

1. p1 executes l1
2. p2 executes l2 with success
3. p1 executes l2 with failure
4. p2 execute l3, but doesn't see the results of step 1, because 3 didn't provide enough of memory barrier

------
Regards,
Alexander Korotkov
Supabase

pgsql-hackers by date:

Previous
From: Pavel Borisov
Date:
Subject: Re: pg_atomic_compare_exchange_*() and memory barriers
Next
From: Peter Geoghegan
Date:
Subject: Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)