Re: checkpoint patches - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: checkpoint patches |
Date | |
Msg-id | 20120323142443.GH3938@tamriel.snowman.net Whole thread Raw |
In response to | Re: checkpoint patches (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
* Robert Haas (robertmhaas@gmail.com) wrote: > Well, how do you want to look at it? I thought the last graph you provided was a useful way to view the results. It was my intent to make that clear in my prior email, my apologies if that didn't come through. > Here's the data from 80th > percentile through 100th percentile - percentile, patched, unpatched, > difference - for the same two runs I've been comparing: [...] > 98 12100 24645 -12545 > 99 186043 201309 -15266 > 100 9513855 9074161 439694 Those are the areas that I think we want to be looking at/for: the outliers. > By the way, I reran the tests on master with checkpoint_timeout=16min, > and here are the tps results: 2492.966759, 2588.750631, 2575.175993. > So it seems like not all of the tps gain from this patch comes from > the fact that it increases the time between checkpoints. Comparing > the median of three results between the different sets of runs, > applying the patch and setting a 3s delay between syncs gives you > about a 5.8% increase throughput, but also adds 30-40 seconds between > checkpoints. If you don't apply the patch but do increase time > between checkpoints by 1 minute, you get about a 5.0% increase in > throughput. That certainly means that the patch is doing something - > because 5.8% for 30-40 seconds is better than 5.0% for 60 seconds - > but it's a pretty small effect. That doesn't surprise me too much. As I mentioned before, and Greg please correct me if I'm wrong, but I thought this patch was intended to reduce the latency spikes that we suffer from under some workloads, which can often be attributed back to i/o related contention. I don't believe it's intended or expected to seriously increase throughput. > The picture looks similar here. Increasing checkpoint_timeout isn't > *quite* as good as spreading out the fsyncs, but it's pretty darn > close. For example, looking at the median of the three 98th > percentile numbers for each configuration, the patch bought us a 28% > improvement in 98th percentile latency. But increasing > checkpoint_timeout by a minute bought us a 15% improvement in 98th > percentile latency. So it's still not clear to me that the patch is > doing anything on this test that you couldn't get just by increasing > checkpoint_timeout by a few more minutes. Granted, it lets you keep > your inter-checkpoint interval slightly smaller, but that's not that > exciting. That having been said, I don't have a whole lot of trouble > believing that there are other cases where this is more worthwhile. I could certainly see the checkpoint_timeout parameter, along with the others, as being sufficient to address this, in which case we likely don't need the patch. They're both more-or-less intended to do the same thing and it's just a question of if being more granular ends up helping or not. Thanks, Stephen
pgsql-hackers by date: