Hi,
On 2025-03-07 19:15:23 +0200, Alexander Korotkov wrote:
> On Fri, Mar 7, 2025 at 7:07 PM Andres Freund <andres@anarazel.de> wrote:
> > What is the access pattern and the observed problems with it that made you
> > look at the disassembly?
>
> Check this code.
>
> l1: pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], NewPageEndPtr);
> /*
> * Try to advance XLogCtl->InitializedUpTo.
> *
> * If the CAS operation failed, then some of previous pages are not
> * initialized yet, and this backend gives up.
> *
> * Since initializer of next page might give up on advancing of
> * InitializedUpTo, this backend have to attempt advancing until it
> * find page "in the past" or concurrent backend succeeded at
> * advancing. When we finish advancing XLogCtl->InitializedUpTo, we
> * notify all the waiters with XLogCtl->InitializedUpToCondVar.
> */
> l2: while (pg_atomic_compare_exchange_u64(&XLogCtl->InitializedUpTo,
> &NewPageBeginPtr, NewPageEndPtr))
> {
> NewPageBeginPtr = NewPageEndPtr;
> NewPageEndPtr = NewPageBeginPtr + XLOG_BLCKSZ;
> nextidx = XLogRecPtrToBufIdx(NewPageBeginPtr);
>
> l3: if (pg_atomic_read_u64(&XLogCtl->xlblocks[nextidx]) !=
> NewPageEndPtr)
> {
> /*
> * Page at nextidx wasn't initialized yet, so we cann't move
> * InitializedUpto further. It will be moved by backend
> which
> * will initialize nextidx.
> */
>
> ConditionVariableBroadcast(&XLogCtl->InitializedUpToCondVar);
> break;
> }
> }
>
> Consider the following execution order with process 1 (p1) and process 2
> (p2).
On 2025-03-07 19:24:39 +0200, Alexander Korotkov wrote:
> Sorry, I messed this up.
> The correct sequence is following.
>
> 1. p1 executes l1
> 2. p1 executes l2 with failure
> 3. p2 executes l2 with success
> 4. p2 execute l3, but doesn't see the results of step 1, because 3
> didn't provide enough of memory barrier
Did you mean because 2) didn't provide enough of a memory barrier? Because 3)
does, right?
You could get in exactly same the situation if the p1 is scheduled out by the
OS after step 1, no?
Greetings,
Andres Freund