some longer, larger pgbench tests with various performance-related patches - Mailing list pgsql-hackers

From Robert Haas
Subject some longer, larger pgbench tests with various performance-related patches
Date
Msg-id CA+Tgmobvif_ErSj7hWZ5xzLhDX_fGZbiqKt1EvPdLaHrj+p3Xw@mail.gmail.com
Whole thread Raw
Responses Re: some longer, larger pgbench tests with various performance-related patches  (Simon Riggs <simon@2ndQuadrant.com>)
Re: some longer, larger pgbench tests with various performance-related patches  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Re: some longer, larger pgbench tests with various performance-related patches  (Jeff Janes <jeff.janes@gmail.com>)
Re: some longer, larger pgbench tests with various performance-related patches  (Greg Smith <greg@2ndQuadrant.com>)
List pgsql-hackers
Early yesterday morning, I was able to use Nate Boley's test machine
do a single 30-minute pgbench run at scale factor 300 using a variety
of trees built with various patches, and with the -l option added to
track latency on a per-transaction basis.  All tests were done using
32 clients and permanent tables.  The configuration was otherwise
identical to that described here:

http://archives.postgresql.org/message-id/CA+TgmoboYJurJEOB22Wp9RECMSEYGNyHDVFv5yisvERqFw=6dw@mail.gmail.com

By doing this, I hoped to get a better understanding of (1) the
effects of a scale factor too large to fit in shared_buffers, (2) what
happens on a longer test run, and (3) how response time varies
throughout the test.  First, here are the raw tps numbers:

background-clean-slru-v2: tps = 2027.282539 (including connections establishing)
buffreelistlock-reduction-v1: tps = 2625.155348 (including connections
establishing)
buffreelistlock-reduction-v1-freelist-ok-v2: tps = 2468.638149
(including connections establishing)
freelist-ok-v2: tps = 2467.065010 (including connections establishing)
group-commit-2012-01-21: tps = 2205.128609 (including connections establishing)
master: tps = 2200.848350 (including connections establishing)
removebufmgrfreelist-v1: tps = 2679.453056 (including connections establishing)
xloginsert-scale-6: tps = 3675.312202 (including connections establishing)

Obviously these numbers are fairly noisy, especially since this is
just one run, so the increases and decreases might not be all that
meaningful.  Time permitting, I'll try to run some more tests to get
my hands around that situation a little better,

Graphs are here:

http://wiki.postgresql.org/wiki/Robert_Haas_9.2CF4_Benchmark_Results

There are two graphs for each branch.  The first is a scatter plot of
latency vs. transaction time.  I found that graph hard to understand,
though; I couldn't really tell what I was looking at.  So I made a
second set of graphs which graph number of completed transactions in a
given second of the test against time.  The results are also included
on the previous page, below the latency graphs, and I find them much
more informative.

A couple of things stand out at me from these graphs.  First, some of
these transactions had really long latency.  Second, there are a
remarkable number of seconds all of the test during which no
transactions at all manage to complete, sometimes several seconds in a
row.  I'm not sure why.  Third, all of the tests initially start of
processing transactions very quickly, and get slammed down very hard,
probably because the very high rate of transaction processing early on
causes a checkpoint to occur around 200 s.  I didn't actually log when
the checkpoints were occuring, but it seems like a good guess.  It's
also interesting to wonder whether the checkpoint I/O itself causes
the drop-off, or the ensuing full page writes.  Fourth,
xloginsert-scale-6 helps quite a bit; in fact, it's the only patch
that actually changes the whole shape of the tps graph.  I'm
speculating here, but that may be because it blunts the impact of full
page writes by allowing backends to copy their full page images into
the write-ahead log in parallel.

One thing I also noticed while running the tests is that the system
was really not using much CPU time.  It was mostly idle.  That could
be because waiting for I/O leads to waiting for locks, or it could be
fundamental lock contention.  I don't know which.

A couple of obvious further tests suggest themselves: (1) rerun some
of the tests with full_page_writes=off, and (2) repeat this test with
the remaining performance-related patches.  It would be especially
interesting, I think, to see what effect the checkpoint-related
patches have on these graphs.  But I plan to drop
buffreelistlock-reduction-v1 and freelist-ok-v2 from future test runs
based on Simon's comments elsewhere.  I'm including the results here
just because these tests were already running when he made those
comments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: New replication mode: write
Next
From: Merlin Moncure
Date:
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements