Re: Update on the spinlock->pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux - Mailing list pgsql-hackers
From | Jeff Janes |
---|---|
Subject | Re: Update on the spinlock->pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux |
Date | |
Msg-id | CAMkU=1wJpgDz4Zj0+N+ZFX4B2q5aPULiaTMgeNbaadpafmKtqA@mail.gmail.com Whole thread Raw |
In response to | Re: Update on the spinlock->pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux (Nils Goroll <slink@schokola.de>) |
Responses |
Re: Update on the spinlock->pthread_mutex patch experimental:
replace s_lock spinlock code with pthread_mutex on linux
Re: Update on the spinlock->pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux spinlock->pthread_mutex : first results with Jeff's pgbench+plsql |
List | pgsql-hackers |
On Sun, Jul 1, 2012 at 2:28 PM, Nils Goroll <slink@schokola.de> wrote: > Hi Jeff, > >>>> It looks like the hacked code is slower than the original. That >>>> doesn't seem so good to me. Am I misreading this? >>> >>> No, you are right - in a way. This is not about maximizing tps, this is about >>> maximizing efficiency under load situations >> >> But why wouldn't this maximized efficiency present itself as increased TPS? > > Because the latency of lock aquision influences TPS, but this is only marginally > related to the cost in terms of cpu cyclues to aquire the locks. > > See my posting as of Sun, 01 Jul 2012 21:02:05 +0200 for an overview of my > understanding. I still don't see how improving that could not improve TPS. But let's focus on reproducing the problem first, otherwise it is all just talking in the dark. > But I don't understand yet how to best provoke high spinlock concurrency with > pgbench. Or are there are any other test tools out there for this case? Use pgbench -S, or apply my patch from "pgbench--new transaction type" and then run pgbench -P. Make sure that the scale is such that all of your data fits in shared_buffers (I find on 64 bit that pgbench takes about 15MB * scale) >> Anyway, your current benchmark speed of around 600 TPS over such a >> short time periods suggests you are limited by fsyncs. > > Definitely. I described the setup in my initial posting ("why roll-your-own > s_lock? / improving scalability" - Tue, 26 Jun 2012 19:02:31 +0200) OK. It looks like several things changed simultaneously. How likely do you think it is that the turning off of the write cache caused the problem? > >> pgbench does as long as that is the case. You could turn --fsync=off, >> or just change your benchmark to a read-only one like -S, or better >> the -P option I've been trying get into pgbench. > > I don't like to make assumptions which I haven't validated. The system showing > the behavior is designed to write to persistent SSD storage in order to reduce > the risk of data loss by a (BBU) cache failure. Running a test with fsync=off > would divert even further from reality. I think that you can't get much farther from reality than your current benchmarks are, I'm afraid. If your goal is the get pgbench closer to being limited by spinlock contention, then fsync=off, or using -S or -P, will certainly do that. So if you have high confidence that spinlock contention is really the problem, fsync=off will get you closer to the thing you want to focus on, even if it takes you further away from the holistic big-picture production environment. And since you went to the trouble of making patches for spinlocks, I assume you are fairly confident that that is the problem. If you are not confident that spinlocks are really the problem, then I agree it would be a mistake to try to craft a simple pgbench run which focuses in on one tiny area which might not actually be the correct area. In that case, you would instead want to either create a very complicated workload that closely simulates your production load (a huge undertaking) or find a way to capture an oprofile of the production server while it is actually in distress. Also, it would help if you could get oprofile to do a call graph so you can see which call sites the contended spin locks are coming from (sorry, I don't know how to do this successfully with oprofile) > >> Does your production server have fast fsyncs (BBU) while your test >> server does not? > > No, we're writing directly to SSDs (ref: initial posting). OK. So it seems like the pgbench workload you are doing are limited by fsyncs, and the CPU is basically idle because of that limit. While your real work load needs a much larger amount of processing power per fsync, so it is closer to both limits at the same time. But, since the stats you posted were for the normal rather than the distressed state, maybe I'm way off here. Anyway, the easiest way to increase the pgbench "CPU per fsync" need is to turn of fsync or synchronous_commit, or to switch to read only queries. >>> 2 54.4s 2 27.18 SELECT ... >> >> That is interesting. Maybe those two queries are hammering everything >> else to death. > > With 64 cores? Maybe. That is the nature of spin-locks. The more cores you have, the more other things each one interferes with. Except that the duration is not long enough to cover the entire run period. But then again, maybe in the distressed state those same queries did cover the entire duration. But yeah, now that I think about it this would not be my top hypothesis. >> >> In other words, how many query-seconds worth of time transpired during >> the 137 wall seconds? That would give an estimate of how many >> simultaneously active connections the production server has. > > Sorry, I should have given you the stats from pgFouine: > > Number of unique normalized queries: 507 > Number of queries: 295,949 > Total query duration: 8m38s > First query: 2012-06-23 14:51:01 > Last query: 2012-06-23 14:53:17 > Query peak: 6,532 queries/s at 2012-06-23 14:51:33 A total duration of 518 seconds over 136 seconds of wall time suggests there is not all that much concurrent activity going on. But maybe time spent in commit is not counted by pgFouine? But again, these stats are for the normal state, not the distressed state. > Thank you very much, Jeff! The one question remains: Do we really have all we > need to provoke very high lock contention? I think you do. (I don't have 64 cores...) Lots of cores, running pgbench -c64 -j64 -P -T60 on a scale that fits in shared_buffers. Cheers, Jeff
pgsql-hackers by date: