Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA - Mailing list pgsql-hackers
From | Sokolov Yura |
---|---|
Subject | Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA |
Date | |
Msg-id | cace5dc9bfec2ae206db56968fe3575e@postgrespro.ru Whole thread Raw |
In response to | Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA (Andres Freund <andres@anarazel.de>) |
Responses |
Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA
|
List | pgsql-hackers |
Hello, On 2017-10-19 19:46, Andres Freund wrote: > On 2017-10-19 14:36:56 +0300, Sokolov Yura wrote: >> > > + init_local_spin_delay(&delayStatus); >> > >> > The way you moved this around has the disadvantage that we now do this - >> > a number of writes - even in the very common case where the lwlock can >> > be acquired directly. >> >> Excuse me, I don't understand fine. >> Do you complain against init_local_spin_delay placed here? > > Yes. I could place it before perform_spin_delay under `if (!spin_inited)` if you think it is absolutely must. > > >> Placing it in other place will complicate code. > > >> Or you complain against setting `mask` and `add`? > > That seems right. > > >> In both cases, I think simpler version should be accepted first. It >> acts >> as algorithm definition. And it already gives measurable improvement. > > Well, in scalability. I'm less sure about uncontended performance. > > > >> > > + * We intentionally do not call finish_spin_delay here, because >> > > the loop >> > > + * above usually finished by queuing into the wait list on >> > > contention, and >> > > + * doesn't reach spins_per_delay thereby doesn't sleep inside of >> > > + * perform_spin_delay. Also, different LWLocks has very different >> > > + * contention pattern, and it is wrong to update spin-lock >> > > statistic based >> > > + * on LWLock contention. >> > > + */ >> > >> > Huh? This seems entirely unconvincing. Without adjusting this here we'll >> > just spin the same way every iteration. Especially for the case where >> > somebody else holds LW_FLAG_LOCKED that's quite bad. >> >> LWLock's are very different. Some of them are always short-term >> (BufferLock), others are always locked for a long time. > > That seems not particularly relevant. The same is true for > spinlocks. The relevant question isn't how long the lwlock is held, > it's > how long LW_FLAG_LOCKED is held - and that should only depend on > contention (i.e. bus speed, amount of times put into sleep while > holding > lock, etc), not on how long the lock is held. > >> I've tried to place this delay into lock itself (it has 2 free bytes), >> but this attempt performed worse. > > That seems unsurprising - there's a *lot* of locks, and we'd have to > tune all of them. Additionally there's a bunch of platforms where we do > *not* have free bytes (consider the embedding in BufferTag). > > >> Now I understand, that delays should be stored in array indexed by >> tranche. But I have no time to test this idea. And I doubt it will >> give >> cardinally better results (ie > 5%), so I think it is better to accept >> patch in this way, and then experiment with per-tranche delay. > > I don't think tranches have any decent predictive power here. Look after "Make acquiring LWLock to look more like spinlock". First `skip_wait_list` iterations there is no attempt to queue self into wait list. It gives most of improvement. (without it there is just no degradation on high number of clients, but a little decline on low clients, because current algorithm is also a little `spinny` (because it attempts to acquire lock again after queueing into wait list)). `skip_wait_list` depends on tranche very much. And per-tranche `skip_wait_list` should be calculated based on ability to acquire lock without any sleep, ie without both queuing itself into wait list and falling into `pg_usleep` inside `perform_spin_delay` . I suppose, it is obvious that `spins_per_delay` have to be proportional to `skip_wait_list` (it should be at least greater than `skip_wait_list`). Therefore `spins_per_delay` is also should be runtime-dependent on lock's tranche. Am I wrong? Without that per-tranche auto-tuning, it is better to not touch `spins_per_delay` inside of `LWLockAttemptLockOrQueue`, I think. With regards, -- Sokolov Yura aka funny_falcon Postgres Professional: https://postgrespro.ru The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: