Re: checkpoint patches - Mailing list pgsql-hackers

From Robert Haas
Subject Re: checkpoint patches
Date
Msg-id CA+TgmobuoY6Sc_WQbv=SZYkGC+yWejR7Bdyra3kbwL=sVAbxvA@mail.gmail.com
Whole thread Raw
In response to Re: checkpoint patches  (Stephen Frost <sfrost@snowman.net>)
Responses Re: checkpoint patches
Re: checkpoint patches
List pgsql-hackers
On Thu, Mar 22, 2012 at 8:44 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> On Thu, Mar 22, 2012 at 3:45 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> > Well, those numbers just aren't that exciting. :/
>>
>> Agreed.  There's clearly an effect, but on this test it's not very big.
>
> Ok, perhaps that was because of how you were analyzing it using the 90th
> percetile..?

Well, how do you want to look at it?  Here's the data from 80th
percentile through 100th percentile - percentile, patched, unpatched,
difference - for the same two runs I've been comparing:

80 1321 1348 -27
81 1333 1360 -27
82 1345 1373 -28
83 1359 1387 -28
84 1373 1401 -28
85 1388 1417 -29
86 1404 1434 -30
87 1422 1452 -30
88 1441 1472 -31
89 1462 1494 -32
90 1487 1519 -32
91 1514 1548 -34
92 1547 1582 -35
93 1586 1625 -39
94 1637 1681 -44
95 1709 1762 -53
96 1825 1905 -80
97 2106 2288 -182
98 12100 24645 -12545
99 186043 201309 -15266
100 9513855 9074161 439694

Here are the 95th-100th percentiles for each of the six runs:

ckpt.checkpoint-sync-pause-v1.10: 1709, 1825, 2106, 12100, 186043, 9513855
ckpt.checkpoint-sync-pause-v1.11: 1707, 1824, 2118, 16792, 196107, 8869602
ckpt.checkpoint-sync-pause-v1.12: 1693, 1807, 2091, 15132, 191207, 7246326
ckpt.master.10: 1734, 1875, 2235, 21145, 203214, 6855888
ckpt.master.11: 1762, 1905, 2288, 24645, 201309, 9074161
ckpt.master.12: 1746, 1889, 2272, 20309, 194459, 7833582

By the way, I reran the tests on master with checkpoint_timeout=16min,
and here are the tps results: 2492.966759, 2588.750631, 2575.175993.
So it seems like not all of the tps gain from this patch comes from
the fact that it increases the time between checkpoints.  Comparing
the median of three results between the different sets of runs,
applying the patch and setting a 3s delay between syncs gives you
about a 5.8% increase throughput, but also adds 30-40 seconds between
checkpoints.  If you don't apply the patch but do increase time
between checkpoints by 1 minute, you get about a 5.0% increase in
throughput.  That certainly means that the patch is doing something -
because 5.8% for 30-40 seconds is better than 5.0% for 60 seconds -
but it's a pretty small effect.

And here are the latency results for 95th-100th percentile with
checkpoint_timeout=16min.

ckpt.master.13: 1703, 1830, 2166, 17953, 192434, 43946669
ckpt.master.14: 1728, 1858, 2169, 15596, 187943, 9619191
ckpt.master.15: 1700, 1835, 2189, 22181, 206445, 8212125

The picture looks similar here.  Increasing checkpoint_timeout isn't
*quite* as good as spreading out the fsyncs, but it's pretty darn
close.  For example, looking at the median of the three 98th
percentile numbers for each configuration, the patch bought us a 28%
improvement in 98th percentile latency.  But increasing
checkpoint_timeout by a minute bought us a 15% improvement in 98th
percentile latency.  So it's still not clear to me that the patch is
doing anything on this test that you couldn't get just by increasing
checkpoint_timeout by a few more minutes.  Granted, it lets you keep
your inter-checkpoint interval slightly smaller, but that's not that
exciting.  That having been said, I don't have a whole lot of trouble
believing that there are other cases where this is more worthwhile.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Reporting WAL file containing checkpoint's REDO record in pg_controldata's result
Next
From: Robert Haas
Date:
Subject: Re: Reporting WAL file containing checkpoint's REDO record in pg_controldata's result