Re: Fixed xloginsert_locks for 9.4 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Fixed xloginsert_locks for 9.4
Date
Msg-id 20141003122617.GQ7158@awork2.anarazel.de
Whole thread Raw
In response to Fixed xloginsert_locks for 9.4  (Greg Smith <greg.smith@crunchydatasolutions.com>)
Responses Re: Fixed xloginsert_locks for 9.4
List pgsql-hackers
On 2014-10-02 20:08:33 -0400, Greg Smith wrote:
> I did a fair dive into double-checking the decision to just leave
> xloginsert_locks fixed at 8 for 9.4.  My conclusion:  good call, move along.
> Further improvements beyond what the 8-way split gives sure are possible.
> But my guess from chasing them a little is that additional places will pop
> up as things that must also be tweaked, before you'll see those gains turn
> significant.

Thanks for doing this.


> I'd like to see that box re-opened at one point.  But if we do that, I'm
> comfortable that could end with a xloginsert_locks that tunes itself
> reasonably on large servers in the end, similar to wal_buffers.  There's
> nothing about this that makes feel like it needs a GUC.  I barely needed an
> exposed knob to do this evaluation.
>
> = Baseline =
>
> I rolled back a few commits to just before the GUC was removed and tested
> against that point in git time.  Starting with the 4 client test case Heikki
> provided, the fastest runs on my 24 core server looked like this:
>
> tps = 56.691855 (including connections establishing)
>
> Repeat runs do need to drop the table and rebuild, because eventually AV
> kicks in on things in a big way, and then your test is toast until it's
> done.  Attached is what I settled on for a test harness. Nothing here was so
> subtle I felt a more complicated harness was needed.
>
> Standard practice for me is to give pgbench more workers when worrying about
> any scalability tests.  That gives a tiny improvement, to where this is
> typical with 4 clients and 4 workers:
>
> tps = 60.942537 (including connections establishing)
>
> Increasing to 24 clients plus 24 workers gives roughly the same numbers,
> suggesting that the bottleneck here is certainly not the client count, and
> that the suggestion of 4 was high enough:
>
> tps = 56.731581 (including connections establishing)
>
> Decreasing xloginsert_locks to 1, so back to the original problem, the rate
> normally looks like this instead:
>
> tps = 25.384708 (including connections establishing)
>
> So the big return you get just fine with the default tuning; great. I'm
> happy to see it ship like this as good enough for 9.4.
>
> = More locks =
>
> For the next phase, I stuck to 24 clients and 24 workers.  If I then bump up
> xloginsert_locks to something much larger, there is an additional small gain
> to be had.  With 24 locks, so basically ever client has their own, instead
> of 57-60 TPS, I managed to get as high as this:
>
> tps = 66.790968 (including connections establishing)
>
> However, the minute I get into this territory, there's an obvious bottleneck
> shift going on in there too.  The rate of creating new checkpoint segments
> becomes troublesome as one example, with messages like this:
>
> LOG:  checkpoints are occurring too frequently (1 second apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
>
> When 9.4 is already giving a more than 100% gain on this targeted test case,
> I can't see that chasing after maybe an extra 10% is worth having yet
> another GUC around.  Especially when it will probably take multiple tuning
> steps before you're done anyway; we don't really know the rest of them yet;
> and when we do, we probably won't need a GUC to cope with them in the end
> anyway.

I've modified the test slightly, by having the different backends insert
into different relations. Even on my measly 5 year old workstation I
*do* see quite a bit more than 10%.


psql -f /tmp/prepare.sql && pgbench -n -f /tmp/fooinsert.sql -c 64 -j 64 -T 10
on a 2x E5520 server (2 sockets a 4 cores a 2 threads)
with the following configuration:
 -c shared_buffers=2GB
 -c wal_level=hot_standby
 -c full_page_writes=off
 -c checkpoint_segments=400
 -c fsync=off (io system here is abysmally bad)
 -c synchronous_commit=off

#define NUM_XLOGINSERT_LOCKS  1
tps = 52.711939 (including connections establishing)
#define NUM_XLOGINSERT_LOCKS  8
tps = 286.496054 (including connections establishing)
#define NUM_XLOGINSERT_LOCKS  16
tps = 346.113313 (including connections establishing)
#define NUM_XLOGINSERT_LOCKS  24
tps = 363.242111 (including connections establishing)

I'd not be surprised at all if you'd see bigger influence on a system
with 4 sockets.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: pg_receivexlog and replication slots
Next
From: Robert Haas
Date:
Subject: Re: DDL Damage Assessment