Re: Get rid of WALBufMappingLock - Mailing list pgsql-hackers
From | Alexander Korotkov |
---|---|
Subject | Re: Get rid of WALBufMappingLock |
Date | |
Msg-id | CAPpHfduc9YMmGcBocD=03F+rbMV0XpA2KoHQC9bQN1QEpPjvPw@mail.gmail.com Whole thread Raw |
List | pgsql-hackers |
On Fri, Feb 7, 2025 at 1:24 PM Yura Sokolov <y.sokolov@postgrespro.ru> wrote: > 07.02.2025 14:02, Yura Sokolov пишет: > > 07.02.2025 01:26, Alexander Korotkov пишет: > >> Hi! > >> > >> On Sun, Jan 19, 2025 at 2:11 AM Yura Sokolov <y.sokolov@postgrespro.ru> wrote: > >>> > >>> During discussion of Increasing NUM_XLOGINSERT_LOCKS [1], Andres Freund > >>> used benchmark which creates WAL records very intensively. While I this > >>> it is not completely fair (1MB log records are really rare), it pushed > >>> me to analyze write-side waiting of XLog machinery. > >>> > >>> First I tried to optimize WaitXLogInsertionsToFinish, but without great > >>> success (yet). > >>> > >>> While profiling, I found a lot of time is spend in the memory clearing > >>> under global WALBufMappingLock: > >>> > >>> MemSet((char *) NewPage, 0, XLOG_BLCKSZ); > >>> > >>> It is obvious scalability bottleneck. > >>> > >>> So "challenge was accepted". > >>> > >>> Certainly, backend should initialize pages without exclusive lock. But > >>> which way to ensure pages were initialized? In other words, how to > >>> ensure XLogCtl->InitializedUpTo is correct. > >>> > >>> I've tried to play around WALBufMappingLock with holding it for a short > >>> time and spinning on XLogCtl->xlblocks[nextidx]. But in the end I found > >>> WALBufMappingLock is useless at all. > >>> > >>> Instead of holding lock, it is better to allow backends to cooperate: > >>> - I bound ConditionVariable to each xlblocks entry, > >>> - every backend now checks every required block pointed by > >>> InitializedUpto was successfully initialized or sleeps on its condvar, > >>> - when backend sure block is initialized, it tries to update > >>> InitializedUpTo with conditional variable. > >> > >> Looks reasonable for me, but having ConditionVariable per xlog buffer > >> seems overkill for me. Find an attached revision, where I've > >> implemented advancing InitializedUpTo without ConditionVariable. > >> After initialization of each buffer there is attempt to do CAS for > >> InitializedUpTo in a loop. So, multiple processes will try to advance > >> InitializedUpTo, they could hijack initiative from each other, but > >> there is always a leader which will finish the work. > >> > >> There is only one ConditionVariable to wait for InitializedUpTo being advanced. > >> > >> I didn't benchmark my version, just checked that tests passed. > > > > Good day, Alexander. > > Seems I was mistaken twice. > > > I've got mixed but quite close result for both approaches (single or many > > ConditionVariable) on the notebook. Since I have no access to larger > > machine, I can't prove "many" is way better (or discover it worse). > > 1. "many condvars" (my variant) is strictly worse with num locks = 64 and > when pg_logical_emit_message emits just 1kB instead of 1MB. > > Therefore, "single condvar" is strictly better. > > > Given patch after cleanup looks a bit smaller and clearer, I agree to keep > > just single condition variable. > > > > Cleaned version is attached. > > > > I've changed condition for broadcast a bit ("less" instead "not equal"): > > - buffer's border may already go into future, > > - and then other backend will reach not yet initialized buffer and will > > broadcast. > > 2. I've inserted abort if "buffer's border went into future", and wasn't > able to trigger it. > > So I returned update-loop's body to your variant. Good, thank you. I think 0001 patch is generally good, but needs some further polishing, e.g. more comments explaining how does it work. Regarding 0002 patch, it looks generally reasonable. But are 2 attempts always optimal? Are there cases of regression, or cases when more attempts are even better? Could we have there some self-adjusting mechanism like what we have for spinlocks? ------ Regards, Alexander Korotkov Supabase
pgsql-hackers by date: