Re: checkpoint patches - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: checkpoint patches
Date
Msg-id 20120323142443.GH3938@tamriel.snowman.net
Whole thread Raw
In response to Re: checkpoint patches  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
* Robert Haas (robertmhaas@gmail.com) wrote:
> Well, how do you want to look at it?

I thought the last graph you provided was a useful way to view the
results.  It was my intent to make that clear in my prior email, my
apologies if that didn't come through.

> Here's the data from 80th
> percentile through 100th percentile - percentile, patched, unpatched,
> difference - for the same two runs I've been comparing:
[...]
> 98 12100 24645 -12545
> 99 186043 201309 -15266
> 100 9513855 9074161 439694

Those are the areas that I think we want to be looking at/for: the
outliers.

> By the way, I reran the tests on master with checkpoint_timeout=16min,
> and here are the tps results: 2492.966759, 2588.750631, 2575.175993.
> So it seems like not all of the tps gain from this patch comes from
> the fact that it increases the time between checkpoints.  Comparing
> the median of three results between the different sets of runs,
> applying the patch and setting a 3s delay between syncs gives you
> about a 5.8% increase throughput, but also adds 30-40 seconds between
> checkpoints.  If you don't apply the patch but do increase time
> between checkpoints by 1 minute, you get about a 5.0% increase in
> throughput.  That certainly means that the patch is doing something -
> because 5.8% for 30-40 seconds is better than 5.0% for 60 seconds -
> but it's a pretty small effect.

That doesn't surprise me too much.  As I mentioned before, and Greg
please correct me if I'm wrong, but I thought this patch was intended to
reduce the latency spikes that we suffer from under some workloads,
which can often be attributed back to i/o related contention.  I don't
believe it's intended or expected to seriously increase throughput.

> The picture looks similar here.  Increasing checkpoint_timeout isn't
> *quite* as good as spreading out the fsyncs, but it's pretty darn
> close.  For example, looking at the median of the three 98th
> percentile numbers for each configuration, the patch bought us a 28%
> improvement in 98th percentile latency.  But increasing
> checkpoint_timeout by a minute bought us a 15% improvement in 98th
> percentile latency.  So it's still not clear to me that the patch is
> doing anything on this test that you couldn't get just by increasing
> checkpoint_timeout by a few more minutes.  Granted, it lets you keep
> your inter-checkpoint interval slightly smaller, but that's not that
> exciting.  That having been said, I don't have a whole lot of trouble
> believing that there are other cases where this is more worthwhile.

I could certainly see the checkpoint_timeout parameter, along with the
others, as being sufficient to address this, in which case we likely
don't need the patch.  They're both more-or-less intended to do the same
thing and it's just a question of if being more granular ends up helping
or not.
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Finer Extension dependencies
Next
From: Alvaro Herrera
Date:
Subject: Re: Finer Extension dependencies