Re: LogwrtResult contended spinlock - Mailing list pgsql-hackers

From Andres Freund
Subject Re: LogwrtResult contended spinlock
Date
Msg-id 20210130023011.n545o54j65t4kgxn@alap3.anarazel.de
Whole thread Raw
In response to Re: LogwrtResult contended spinlock  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: LogwrtResult contended spinlock  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
Hi,

On 2021-01-29 22:47:56 -0300, Alvaro Herrera wrote:
> So I tried this, but -- perhaps not suprisingly -- I can't get it to
> work properly; the synchronization fails.

What do you mean by "synchronization fails"?


> Strangely, all tests work for me, but the pg_upgrade one in particular
> fails.

It's one of the few tests using fsync=on.


> +static inline void
> +pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 target_)
> +{
> +#ifndef PG_HAVE_ATOMIC_U64_SIMULATION
> +    AssertPointerAlignment(ptr, 8);
> +#endif
> +    /* FIXME is this algorithm correct if we have u64 simulation? */

I don't see a problem.


> +    while (true)
> +    {
> +        uint64        currval;
> +
> +        currval = pg_atomic_read_u64(ptr);
> +        if (currval > target_)
> +            break;    /* already done by somebody else */
> +        if (pg_atomic_compare_exchange_u64(ptr, &currval, target_))
> +            break;    /* successfully done */
> +    }
> +}

I suggest writing this as

    currval = pg_atomic_read_u64(ptr);
    while (currval < target_)
    {
        if (pg_atomic_compare_exchange_u64(ptr, &currval, target_))
            break;
    }

>  /*
>   * Inserting to WAL is protected by a small fixed number of WAL insertion
> @@ -596,8 +599,10 @@ typedef struct XLogCtlData
>  {
>      XLogCtlInsert Insert;
>  
> +    XLogwrtAtomic LogwrtRqst;
> +    XLogwrtAtomic LogwrtResult;
> +
>      /* Protected by info_lck: */
> -    XLogwrtRqst LogwrtRqst;

Not sure putting these into the same cacheline is a good idea.


> @@ -2166,12 +2163,11 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
>              if (opportunistic)
>                  break;
>  
> -            /* Before waiting, get info_lck and update LogwrtResult */
> -            SpinLockAcquire(&XLogCtl->info_lck);
> -            if (XLogCtl->LogwrtRqst.Write < OldPageRqstPtr)
> -                XLogCtl->LogwrtRqst.Write = OldPageRqstPtr;
> -            LogwrtResult = XLogCtl->LogwrtResult;
> -            SpinLockRelease(&XLogCtl->info_lck);
> +            /* Before waiting, update LogwrtResult */
> +            pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, OldPageRqstPtr);
> +
> +            LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);
> +            LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);

I don't think it's quite as easy as this. Write/Flush now aren't
guaranteed to be coherent with each other - previously they were. And
because it's in a global variable used everywhere, we can't just be more
careful about update protocols in one place...

We also shouldn't re-read a variable that we just did via the
pg_atomic_monotonic_advance_u64().

I think we should stop updating both the Write/Flush position at the
same time. That way we don't have an expectation of them being coherent
with each other. Most places really don't need both anyway.


>      {
> -        SpinLockAcquire(&XLogCtl->info_lck);
> -        XLogCtl->LogwrtResult = LogwrtResult;
> -        if (XLogCtl->LogwrtRqst.Write < LogwrtResult.Write)
> -            XLogCtl->LogwrtRqst.Write = LogwrtResult.Write;
> -        if (XLogCtl->LogwrtRqst.Flush < LogwrtResult.Flush)
> -            XLogCtl->LogwrtRqst.Flush = LogwrtResult.Flush;
> -        SpinLockRelease(&XLogCtl->info_lck);
> +        pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtResult.Write, LogwrtResult.Write);
> +        pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtResult.Flush, LogwrtResult.Flush);
> +
> +        pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, LogwrtResult.Write);
> +        pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Flush, LogwrtResult.Flush);

Hm. We went from one cheap atomic operation (SpinLockAcquire) to four
expensive ones in the happy path. That's not free...

I don't think we need to manipulate XLogCtl->LogwrtResult.* using atomic
ops - they can only be updated with WALWriteLock held, right?

XLogCtl->LogwrtResult was updated with plain assignment before, why did
you change it to pg_atomic_monotonic_advance_u64()?


> @@ -1169,6 +1169,8 @@ XLogInsertRecord(XLogRecData *rdata,
>      {
>          /* advance global request to include new block(s) */
>          pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, EndPos);
> +        pg_memory_barrier();
> +

That's not really useful - the path that actually updates already
implies a barrier. It'd probably be better to add a barrier to a "never
executed cmpxchg" fastpath.



> @@ -2905,6 +2909,7 @@ XLogFlush(XLogRecPtr record)
>          WriteRqstPtr = pg_atomic_read_u64(&XLogCtl->LogwrtRqst.Write);
>          LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);
>          LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);
> +        pg_read_barrier();

I'm not sure you really can get away with just a read barrier in these
places. We can't e.g. have later updates to other shared memory
variables be "moved" to before the barrier (which a read barrier
allows).

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Thoughts on "killed tuples" index hint bits support on standby
Next
From: Michael Paquier
Date:
Subject: Re: Should we make Bitmapsets a kind of Node?