Re: WAL Insertion Lock Improvements - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: WAL Insertion Lock Improvements
Date
Msg-id CALj2ACWv=YsOTvSo9WnkLQa3Fj=rzgzxULtV2T9-N2AShJigPw@mail.gmail.com
Whole thread Raw
In response to Re: WAL Insertion Lock Improvements  (Andres Freund <andres@anarazel.de>)
Responses Re: WAL Insertion Lock Improvements
List pgsql-hackers
On Fri, Jul 14, 2023 at 4:17 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2023-07-13 14:04:31 -0700, Andres Freund wrote:
> > From b74b6e953cb5a7e7ea1a89719893f6ce9e231bba Mon Sep 17 00:00:00 2001
> > From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
> > Date: Fri, 19 May 2023 15:00:21 +0000
> > Subject: [PATCH v8] Optimize WAL insertion lock acquisition and release
> >
> > This commit optimizes WAL insertion lock acquisition and release
> > in the following way:
>
> I think this commit does too many things at once.

I've split the patch into three - 1) Make insertingAt 64-bit atomic.
2) Have better commenting on why there's no memory barrier or spinlock
in and around LWLockWaitForVar call sites. 3) Have a quick exit for
LWLockUpdateVar.

> > 1. WAL insertion lock's variable insertingAt is currently read and
> > written with the help of lwlock's wait list lock to avoid
> > torn-free reads/writes. This wait list lock can become a point of
> > contention on a highly concurrent write workloads. Therefore, make
> > insertingAt a 64-bit atomic which inherently provides torn-free
> > reads/writes.
>
> "inherently" is a bit strong, given that we emulate 64bit atomics where not
> available...

Modified.

> > 2. LWLockUpdateVar currently acquires lwlock's wait list lock even when
> > there are no waiters at all. Add a fastpath exit to LWLockUpdateVar when
> > there are no waiters to avoid unnecessary locking.
>
> I don't think there's enough of an explanation for why this isn't racy.
>
> The reason it's, I think, safe, is that anyone using LWLockConflictsWithVar()
> will do so twice in a row, with a barrier inbetween. But that really relies on
> what I explain in the paragraph below:

The twice-in-a-row lock acquisition protocol used by LWLockWaitForVar
is helping us out have quick exit in LWLockUpdateVar. Because,
LWLockWaitForVar ensures that they are added to the wait queue even if
LWLockUpdateVar thinks that there aren't waiters. Is my understanding
correct here?

> > It also adds notes on why LWLockConflictsWithVar doesn't need a
> > memory barrier as far as its current usage is concerned.
>
> I don't think:
>                          * NB: LWLockConflictsWithVar (which is called from
>                          * LWLockWaitForVar) relies on the spinlock used above in this
>                          * function and doesn't use a memory barrier.
>
> helps to understand why any of this is safe to a meaningful degree.
>
> The existing comments aren't obviously aren't sufficient to explain this, but
> the reason it's, I think, safe today, is that we are only waiting for
> insertions that started before WaitXLogInsertionsToFinish() was called.  The
> lack of memory barriers in the loop means that we might see locks as "unused"
> that have *since* become used, which is fine, because they only can be for
> later insertions that we wouldn't want to wait on anyway.

Right.

> Not taking a lock to acquire the current insertingAt value means that we might
> see older insertingAt value. Which should also be fine, because if we read a
> too old value, we'll add ourselves to the queue, which contains atomic
> operations.

Right. An older value adds ourselves to the queue in LWLockWaitForVar,
and we should be woken up eventually by LWLockUpdateVar.

This matches with my understanding. I used more or less your above
wording in 0002 patch.

> >       /*
> > -      * Read value using the lwlock's wait list lock, as we can't generally
> > -      * rely on atomic 64 bit reads/stores.  TODO: On platforms with a way to
> > -      * do atomic 64 bit reads/writes the spinlock should be optimized away.
> > +      * Reading the value atomically ensures that we don't need any explicit
> > +      * locking. Note that in general, 64 bit atomic APIs in postgres inherently
> > +      * provide explicit locking for the platforms without atomics support.
> >        */
>
> This comment seems off to me. Using atomics doesn't guarantee not needing
> locking. It just guarantees that we are reading a non-torn value.

Modified the comment.

> > @@ -1605,9 +1602,14 @@ LWLockConflictsWithVar(LWLock *lock,
> >   *
> >   * Note: this function ignores shared lock holders; if the lock is held
> >   * in shared mode, returns 'true'.
> > + *
> > + * Be careful that LWLockConflictsWithVar() does not include a memory barrier,
> > + * hence the caller of this function may want to rely on an explicit barrier or
> > + * a spinlock to avoid memory ordering issues.
> >   */
>
> s/careful/aware/?
>
> s/spinlock/implied barrier via spinlock or lwlock/?

Done.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Masahiro Ikeda
Date:
Subject: Re: Support worker_spi to execute the function dynamically.
Next
From: Daniel Gustafsson
Date:
Subject: Re: There should be a way to use the force flag when restoring databases