Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA |
Date | |
Msg-id | 20171018232801.r2idhipqm36hdfww@alap3.anarazel.de Whole thread Raw |
In response to | [HACKERS] Fix performance degradation of contended LWLock on NUMA (Sokolov Yura <funny.falcon@postgrespro.ru>) |
List | pgsql-hackers |
Hi, On 2017-06-05 16:22:58 +0300, Sokolov Yura wrote: > Patch changes the way LWLock is acquired. > > Before patch, lock is acquired using following procedure: > 1. first LWLock->state is checked for ability to take a lock, and CAS > performed (either lock could be acquired or not). And it is retried if > status were changed. > 2. if lock is not acquired at first 1, Proc is put into wait list, using > LW_FLAG_LOCKED bit of LWLock->state as a spin-lock for wait list. > In fact, profile shows that LWLockWaitListLock spends a lot of CPU on > contendend LWLock (upto 20%). > Additional CAS loop is inside pg_atomic_fetch_or_u32 for setting > LW_FLAG_HAS_WAITERS. And releasing wait list lock is another CAS loop > on the same LWLock->state. > 3. after that state is checked again, because lock could be released > before Proc were queued into wait list. > 4. if lock were acquired at step 3, Proc should be dequeued from wait > list (another spin lock and CAS loop). And this operation could be > quite heavy, because whole list should be traversed. > > Patch makes lock acquiring in single CAS loop: > 1. LWLock->state is read, and ability for lock acquiring is detected. > If there is possibility to take a lock, CAS tried. > If CAS were successful, lock is aquired (same to original version). > 2. but if lock is currently held by other backend, we check ability for > taking WaitList lock. If wait list lock is not help by anyone, CAS > perfomed for taking WaitList lock and set LW_FLAG_HAS_WAITERS at once. > If CAS were successful, then LWLock were still held at the moment wait > list lock were held - this proves correctness of new algorithm. And > Proc is queued to wait list then. > 3. Otherwise spin_delay is performed, and loop returns to step 1. Right, something like this didn't use to be possible because we'd both the LWLock->state and LWLock->mutex. But after 008608b9d5106 that's not a concern anymore. > Algorithm for LWLockWaitForVar is also refactored. New version is: > 1. If lock is not held by anyone, it immediately exit. > 2. Otherwise it is checked for ability to take WaitList lock, because > variable is protected with it. If so, CAS is performed, and if it is > successful, loop breaked to step 4. > 3. Otherwise spin_delay perfomed, and loop returns to step 1. > 4. var is checked for change. > If it were changed, we unlock wait list lock and exit. > Note: it could not change in following steps because we are holding > wait list lock. > 5. Otherwise CAS on setting necessary flags is performed. If it succeed, > then queue Proc to wait list and exit. > 6. If Case failed, then there is possibility for LWLock to be already > released - if so then we should unlock wait list and exit. > 7. Otherwise loop returns to step 5. > > So invariant is preserved: > - either we found LWLock free, > - or we found changed variable, > - or we set flags and queue self while LWLock were held. > > Spin_delay is not performed at step 7, because we should release wait > list lock as soon as possible. That seems unconvincing - by not delaying you're more likely to *increase* the time till the current locker that holds the lock can release the lock. > Additionally, taking wait list lock is skipped in both algorithms if > SpinDelayStatus.spins is less than part of spins_per_delay loop > iterations (so, it is initial iterations, and some iterations after > pg_usleep, because SpinDelayStatus.spins is reset after sleep). I quite strongly think it's a good idea to change this at the same time as the other changes you're proposing here. I think it's a worthwhile thing to pursue, but that change also has quit ethe potential for regressing in some cases. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: