Re: Proposal for CSN based snapshots - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Proposal for CSN based snapshots
Date
Msg-id CAPpHfdu2zqhnjKngm7Yi2f6sq1_+AwzKN6BruQ80b5e-Wc_BdA@mail.gmail.com
Whole thread Raw
In response to Re: Proposal for CSN based snapshots  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
On Wed, Aug 24, 2016 at 11:54 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 08/23/2016 06:18 PM, Heikki Linnakangas wrote:
On 08/22/2016 08:38 PM, Andres Freund wrote:
On 2016-08-22 20:32:42 +0300, Heikki Linnakangas wrote:
I
remember seeing ProcArrayLock contention very visible earlier, but I can't
hit that now. I suspect you'd still see contention on bigger hardware,
though, my laptop has oly 4 cores. I'll have to find a real server for the
next round of testing.

Yea, I think that's true. I can just about see ProcArrayLock contention
on my more powerful laptop, to see it really bad you need bigger
hardware / higher concurrency.

As soon as I sent my previous post, Vladimir Borodin kindly offered
access to a 32-core server for performance testing. Thanks Vladimir!

I installed Greg Smith's pgbench-tools kit on that server, and ran some
tests. I'm seeing some benefit on "pgbench -N" workload, but only after
modifying the test script to use "-M prepared", and using Unix domain
sockets instead of TCP to connect. Apparently those things add enough
overhead to mask out the little difference.

Attached is a graph with the results. Full results are available at
https://hlinnaka.iki.fi/temp/csn-4-results/. In short, the patch
improved throughput, measured in TPS, with >= 32 or so clients. The
biggest difference was with 44 clients, which saw about 5% improvement.

So, not phenomenal, but it's something. I suspect that with more cores,
the difference would become more clear.

Like on a cue, Alexander Korotkov just offered access to a 72-core
system :-). Thanks! I'll run the same tests on that.

And here are the results on the 72 core machine (thanks again, Alexander!). The test setup was the same as on the 32-core machine, except that I ran it with more clients since the system has more CPU cores. In summary, in the best case, the patch increases throughput by about 10%. That peak is with 64 clients. Interestingly, as the number of clients increases further, the gain evaporates, and the CSN version actually performs worse than unpatched master. I don't know why that is. One theory that by eliminating one bottleneck, we're now hitting another bottleneck which doesn't degrade as gracefully when there's contention.

Did you try to identify this second bottleneck with perf or something?
It would be nice to also run pgbench -S.  Also, it would be nice to check something like 10% of writes, 90% of reads (which is quite typical workload in real life I believe).

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: [RFC] Change the default of update_process_title to off
Next
From: Thomas Munro
Date:
Subject: Re: [RFC] Change the default of update_process_title to off