Re: WAL Insertion Lock Improvements (was: Re: Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish()) - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: WAL Insertion Lock Improvements (was: Re: Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish())
Date
Msg-id CALj2ACUc=Hcd3X+mBNAF2o9Y3r=cxVoz4JBayBTz=jw74KopHQ@mail.gmail.com
Whole thread Raw
In response to Re: WAL Insertion Lock Improvements (was: Re: Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish())  (Andres Freund <andres@anarazel.de>)
Responses Re: WAL Insertion Lock Improvements  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On Tue, Dec 6, 2022 at 12:00 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi

Thanks for reviewing.

> FWIW, I don't see an advantage in 0003. If it allows us to make something else
> simpler / faster, cool, but on its own it doesn't seem worthwhile.

I've discarded this change.

> On 2022-12-02 16:31:58 -0800, Nathan Bossart wrote:
> > On Fri, Dec 02, 2022 at 04:32:38PM +0530, Bharath Rupireddy wrote:
> > > On Fri, Dec 2, 2022 at 6:10 AM Andres Freund <andres@anarazel.de> wrote:
> > >> I'm not sure this is quite right - don't we need a memory barrier. But I don't
> > >> see a reason to not just leave this code as-is. I think this should be
> > >> optimized entirely in lwlock.c
> > >
> > > Actually, we don't need that at all as LWLockWaitForVar() will return
> > > immediately if the lock is free. So, I removed it.
> >
> > I briefly looked at the latest patch set, and I'm curious how this change
> > avoids introducing memory ordering bugs.  Perhaps I am missing something
> > obvious.
>
> I'm a bit confused too - the comment above talks about LWLockWaitForVar(), but
> the patches seem to optimize LWLockUpdateVar().
>
> I think it'd be safe to optimize LWLockConflictsWithVar(), due to some
> pre-existing, quite crufty, code. LWLockConflictsWithVar() says:
>
>          * Test first to see if it the slot is free right now.
>          *
>          * XXX: the caller uses a spinlock before this, so we don't need a memory
>          * barrier here as far as the current usage is concerned.  But that might
>          * not be safe in general.
>
> which happens to be true in the single, indirect, caller:
>
>         /* Read the current insert position */
>         SpinLockAcquire(&Insert->insertpos_lck);
>         bytepos = Insert->CurrBytePos;
>         SpinLockRelease(&Insert->insertpos_lck);
>         reservedUpto = XLogBytePosToEndRecPtr(bytepos);
>
> I think at the very least we ought to have a comment in
> WaitXLogInsertionsToFinish() highlighting this.

Done.

> It's not at all clear to me that the proposed fast-path for LWLockUpdateVar()
> is safe. I think at the very least we could end up missing waiters that we
> should have woken up.
>
> I think it ought to be safe to do something like
>
> pg_atomic_exchange_u64()..
> if (!(pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS))
>   return;
>
> because the pg_atomic_exchange_u64() will provide the necessary memory
> barrier.

Done.

I'm attaching the v3 patch with the above review comments addressed.
Hopefully, no memory ordering issues now. FWIW, I've added it to CF
https://commitfest.postgresql.org/42/4141/.

Test results with the v3 patch and insert workload are the same as
that of the earlier run - TPS starts to scale at higher clients as
expected after 512 clients and peaks at 2X with 2048 and 4096 clients.

HEAD:
1 1380.411086
2 1358.378988
4 2701.974332
8 5925.380744
16 10956.501237
32 20877.513953
64 40838.046774
128 70251.744161
256 108114.321299
512 120478.988268
768 99140.425209
1024 93645.984364
2048 70111.159909
4096 55541.804826

v3 PATCHED:
1 1493.800209
2 1569.414953
4 3154.186605
8 5965.578904
16 11912.587645
32 22720.964908
64 42001.094528
128 78361.158983
256 110457.926232
512 148941.378393
768 167256.590308
1024 155510.675372
2048 147499.376882
4096 119375.457779

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: to_hex() for negative inputs
Next
From: David Geier
Date:
Subject: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?