Re: Get rid of WALBufMappingLock - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Get rid of WALBufMappingLock
Date
Msg-id CAPpHfduc9YMmGcBocD=03F+rbMV0XpA2KoHQC9bQN1QEpPjvPw@mail.gmail.com
Whole thread Raw
List pgsql-hackers
On Fri, Feb 7, 2025 at 1:24 PM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
> 07.02.2025 14:02, Yura Sokolov пишет:
> > 07.02.2025 01:26, Alexander Korotkov пишет:
> >> Hi!
> >>
> >> On Sun, Jan 19, 2025 at 2:11 AM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
> >>>
> >>> During discussion of Increasing NUM_XLOGINSERT_LOCKS [1], Andres Freund
> >>> used benchmark which creates WAL records very intensively. While I this
> >>> it is not completely fair (1MB log records are really rare), it pushed
> >>> me to analyze write-side waiting of XLog machinery.
> >>>
> >>> First I tried to optimize WaitXLogInsertionsToFinish, but without great
> >>> success (yet).
> >>>
> >>> While profiling, I found a lot of time is spend in the memory clearing
> >>> under global WALBufMappingLock:
> >>>
> >>>      MemSet((char *) NewPage, 0, XLOG_BLCKSZ);
> >>>
> >>> It is obvious scalability bottleneck.
> >>>
> >>> So "challenge was accepted".
> >>>
> >>> Certainly, backend should initialize pages without exclusive lock. But
> >>> which way to ensure pages were initialized? In other words, how to
> >>> ensure XLogCtl->InitializedUpTo is correct.
> >>>
> >>> I've tried to play around WALBufMappingLock with holding it for a short
> >>> time and spinning on XLogCtl->xlblocks[nextidx]. But in the end I found
> >>> WALBufMappingLock is useless at all.
> >>>
> >>> Instead of holding lock, it is better to allow backends to cooperate:
> >>> - I bound ConditionVariable to each xlblocks entry,
> >>> - every backend now checks every required block pointed by
> >>> InitializedUpto was successfully initialized or sleeps on its condvar,
> >>> - when backend sure block is initialized, it tries to update
> >>> InitializedUpTo with conditional variable.
> >>
> >> Looks reasonable for me, but having ConditionVariable per xlog buffer
> >> seems overkill for me.  Find an attached revision, where I've
> >> implemented advancing InitializedUpTo without ConditionVariable.
> >> After initialization of each buffer there is attempt to do CAS for
> >> InitializedUpTo in a loop.  So, multiple processes will try to advance
> >> InitializedUpTo, they could hijack initiative from each other, but
> >> there is always a leader which will finish the work.
> >>
> >> There is only one ConditionVariable to wait for InitializedUpTo being advanced.
> >>
> >> I didn't benchmark my version, just checked that tests passed.
> >
> > Good day, Alexander.
>
> Seems I was mistaken twice.
>
> > I've got mixed but quite close result for both approaches (single or many
> > ConditionVariable) on the notebook. Since I have no access to larger
> > machine, I can't prove "many" is way better (or discover it worse).
>
> 1. "many condvars" (my variant) is strictly worse with num locks = 64 and
> when pg_logical_emit_message emits just 1kB instead of 1MB.
>
> Therefore, "single condvar" is strictly better.
>
> > Given patch after cleanup looks a bit smaller and clearer, I agree to keep
> > just single condition variable.
> >
> > Cleaned version is attached.
> >
> > I've changed condition for broadcast a bit ("less" instead "not equal"):
> > - buffer's border may already go into future,
> > - and then other backend will reach not yet initialized buffer and will
> > broadcast.
>
> 2. I've inserted abort if "buffer's border went into future", and wasn't
> able to trigger it.
>
> So I returned update-loop's body to your variant.

Good, thank you.  I think 0001 patch is generally good, but needs some
further polishing, e.g. more comments explaining how does it work.

Regarding 0002 patch, it looks generally reasonable.  But are 2
attempts always optimal?  Are there cases of regression, or cases when
more attempts are even better?  Could we have there some
self-adjusting mechanism like what we have for spinlocks?

------
Regards,
Alexander Korotkov
Supabase



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints
Next
From: Peter Eisentraut
Date:
Subject: Re: Virtual generated columns