Re: why roll-your-own s_lock? / improving scalability - Mailing list pgsql-hackers
From | Merlin Moncure |
---|---|
Subject | Re: why roll-your-own s_lock? / improving scalability |
Date | |
Msg-id | CAHyXU0zKwJGbGw_oDUKimxYZoNc7q5+fbtiOynM3Cq5ZqzKH9A@mail.gmail.com Whole thread Raw |
In response to | why roll-your-own s_lock? / improving scalability (Nils Goroll <slink@schokola.de>) |
Responses |
Re: why roll-your-own s_lock? / improving scalability
Re: why roll-your-own s_lock? / improving scalability |
List | pgsql-hackers |
On Tue, Jun 26, 2012 at 12:02 PM, Nils Goroll <slink@schokola.de> wrote: > Hi, > > I am currently trying to understand what looks like really bad scalability of > 9.1.3 on a 64core 512GB RAM system: the system runs OK when at 30% usr, but only > marginal amounts of additional load seem to push it to 70% and the application > becomes highly unresponsive. > > My current understanding basically matches the issues being addressed by various > 9.2 improvements, well summarized in > http://wiki.postgresql.org/images/e/e8/FOSDEM2012-Multi-CPU-performance-in-9.2.pdf > > An additional aspect is that, in order to address the latent risk of data loss & > corruption with WBCs and async replication, we have deliberately moved the db > from a similar system with WB cached storage to ssd based storage without a WBC, > which, by design, has (in the best WBC case) approx. 100x higher latencies, but > much higher sustained throughput. > > > On the new system, even with 30% user "acceptable" load, oprofile makes apparent > significant lock contention: > > opreport --symbols --merge tgid -l /mnt/db1/hdd/pgsql-9.1/bin/postgres > > > Profiling through timer interrupt > samples % image name symbol name > 30240 27.9720 postgres s_lock > 5069 4.6888 postgres GetSnapshotData > 3743 3.4623 postgres AllocSetAlloc > 3167 2.9295 libc-2.12.so strcoll_l > 2662 2.4624 postgres SearchCatCache > 2495 2.3079 postgres hash_search_with_hash_value > 2143 1.9823 postgres nocachegetattr > 1860 1.7205 postgres LWLockAcquire > 1642 1.5189 postgres base_yyparse > 1604 1.4837 libc-2.12.so __strcmp_sse42 > 1543 1.4273 libc-2.12.so __strlen_sse42 > 1156 1.0693 libc-2.12.so memcpy > > Unfortunately I don't have profiling data for the high-load / contention > condition yet, but I fear the picture will be worse and pointing in the same > direction. > > <pure speculation> > In particular, the _impression_ is that lock contention could also be related to > I/O latencies making me fear that cases could exist where spin locks are being > helt while blocking on IO. > </pure speculation> > > > Looking at the code, it appears to me that the roll-your-own s_lock code cannot > handle a couple of cases, for instance it will also spin when the lock holder is > not running at all or blocking on IO (which could even be implicit, e.g. for a > page flush). These issues have long been addressed by adaptive mutexes and futexes. > > Also, the s_lock code tries to be somehow adaptive using spins_per_delay (when > having spun for long (not not blocked), spin even longer in future), which > appears to me to have the potential of becoming highly counter-productive. > > > Now that the scene is set, here's the simple question: Why all this? Why not > simply use posix mutexes which, on modern platforms, will map to efficient > implementations like adaptive mutexes or futexes? Well, that would introduce a backend dependency on pthreads, which is unpleasant. Also you'd need to feature test via _POSIX_THREAD_PROCESS_SHARED to make sure you can mutex between processes (and configure your mutexes as such when you do). There are probably other reasons why this can't be done, but I personally don' t klnow of any. Also, it's forbidden to do things like invoke i/o in the backend while holding only a spinlock. As to your larger point, it's an interesting assertion -- some data to back it up would help. merlin
pgsql-hackers by date: