Re: Fixed xloginsert_locks for 9.4 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Fixed xloginsert_locks for 9.4 |
Date | |
Msg-id | 20141003122617.GQ7158@awork2.anarazel.de Whole thread Raw |
In response to | Fixed xloginsert_locks for 9.4 (Greg Smith <greg.smith@crunchydatasolutions.com>) |
Responses |
Re: Fixed xloginsert_locks for 9.4
|
List | pgsql-hackers |
On 2014-10-02 20:08:33 -0400, Greg Smith wrote: > I did a fair dive into double-checking the decision to just leave > xloginsert_locks fixed at 8 for 9.4. My conclusion: good call, move along. > Further improvements beyond what the 8-way split gives sure are possible. > But my guess from chasing them a little is that additional places will pop > up as things that must also be tweaked, before you'll see those gains turn > significant. Thanks for doing this. > I'd like to see that box re-opened at one point. But if we do that, I'm > comfortable that could end with a xloginsert_locks that tunes itself > reasonably on large servers in the end, similar to wal_buffers. There's > nothing about this that makes feel like it needs a GUC. I barely needed an > exposed knob to do this evaluation. > > = Baseline = > > I rolled back a few commits to just before the GUC was removed and tested > against that point in git time. Starting with the 4 client test case Heikki > provided, the fastest runs on my 24 core server looked like this: > > tps = 56.691855 (including connections establishing) > > Repeat runs do need to drop the table and rebuild, because eventually AV > kicks in on things in a big way, and then your test is toast until it's > done. Attached is what I settled on for a test harness. Nothing here was so > subtle I felt a more complicated harness was needed. > > Standard practice for me is to give pgbench more workers when worrying about > any scalability tests. That gives a tiny improvement, to where this is > typical with 4 clients and 4 workers: > > tps = 60.942537 (including connections establishing) > > Increasing to 24 clients plus 24 workers gives roughly the same numbers, > suggesting that the bottleneck here is certainly not the client count, and > that the suggestion of 4 was high enough: > > tps = 56.731581 (including connections establishing) > > Decreasing xloginsert_locks to 1, so back to the original problem, the rate > normally looks like this instead: > > tps = 25.384708 (including connections establishing) > > So the big return you get just fine with the default tuning; great. I'm > happy to see it ship like this as good enough for 9.4. > > = More locks = > > For the next phase, I stuck to 24 clients and 24 workers. If I then bump up > xloginsert_locks to something much larger, there is an additional small gain > to be had. With 24 locks, so basically ever client has their own, instead > of 57-60 TPS, I managed to get as high as this: > > tps = 66.790968 (including connections establishing) > > However, the minute I get into this territory, there's an obvious bottleneck > shift going on in there too. The rate of creating new checkpoint segments > becomes troublesome as one example, with messages like this: > > LOG: checkpoints are occurring too frequently (1 second apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > > When 9.4 is already giving a more than 100% gain on this targeted test case, > I can't see that chasing after maybe an extra 10% is worth having yet > another GUC around. Especially when it will probably take multiple tuning > steps before you're done anyway; we don't really know the rest of them yet; > and when we do, we probably won't need a GUC to cope with them in the end > anyway. I've modified the test slightly, by having the different backends insert into different relations. Even on my measly 5 year old workstation I *do* see quite a bit more than 10%. psql -f /tmp/prepare.sql && pgbench -n -f /tmp/fooinsert.sql -c 64 -j 64 -T 10 on a 2x E5520 server (2 sockets a 4 cores a 2 threads) with the following configuration: -c shared_buffers=2GB -c wal_level=hot_standby -c full_page_writes=off -c checkpoint_segments=400 -c fsync=off (io system here is abysmally bad) -c synchronous_commit=off #define NUM_XLOGINSERT_LOCKS 1 tps = 52.711939 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 8 tps = 286.496054 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 16 tps = 346.113313 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 24 tps = 363.242111 (including connections establishing) I'd not be surprised at all if you'd see bigger influence on a system with 4 sockets. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: