Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Andres Freund
Subject Re: checkpointer continuous flushing
Date
Msg-id 20160322091852.GA3790@awork2.anarazel.de
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: checkpointer continuous flushing  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
Hi,

On 2016-03-21 18:46:58 +0100, Tomas Vondra wrote:
> I've repeated the tests, but this time logged details for 5% of the
> transaction (instead of aggregating the data for each second). I've also
> made the tests shorter - just 12 hours instead of 24, to reduce the time
> needed to complete the benchmark.
> 
> Overall, this means ~300M transactions in total for the un-throttled case,
> so sample with ~15M transactions available when computing the following
> charts.
> 
> I've used the same commits as during the previous testing, i.e. a298a1e0
> (before patches) and 23a27b03 (with patches).
> 
> One interesting difference is that while the "patched" version resulted in
> slightly better performance (8122 vs. 8000 tps), the "unpatched" version got
> considerably slower (6790 vs. 7725 tps) - that's ~13% difference, so not
> negligible. Not sure what's the cause - the configuration was exactly the
> same, there's nothing in the log and the machine was dedicated to the
> testing. The only explanation I have is that the unpatched code is a bit
> more unstable when it comes to this type of stress testing.
> 
> There results (including scripts for generating the charts) are here:
> 
>     https://github.com/tvondra/flushing-benchmark-2
> 
> Attached are three charts - again, those are using CDF to illustrate the
> distributions and compare them easily:
> 
> 1) regular-latency.png
> 
> The two curves intersect at ~4ms, where both CDF reach ~85%. For the shorter
> transactions, the old code is slightly faster (i.e. apparently there's some
> per-transaction overhead). For higher latencies though, the patched code is
> clearly winning - there are far fewer transactions over 6ms, which makes a
> huge difference. (Notice the x-axis is actually log-scale, so the tail on
> the old code is actually much longer than it might appear.)
> 
> 2) throttled-latency.png
> 
> In the throttled case (i.e. when the system is not 100% utilized, so it's
> more representative of actual production use), the difference is quite
> clearly in favor of the new code.
> 
> 3) throttled-schedule-lag.png
> 
> Mostly just an alternative view on the previous chart, showing how much
> later the transactions were scheduled. Again, the new code is winning.

Thanks for running these tests!

I think this shows that we're in a good shape, and that the commits
succeeded in what they were attempting. Very glad to hear that.


WRT tablespaces: What I'm planning to do, unless somebody has a better
proposal, is to basically rent two big amazon instances, and run pgbench
in parallel over N tablespaces. Once with local SSD and once with local
HDD storage.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: WIP: Access method extendability
Next
From: Tomas Vondra
Date:
Subject: Re: multivariate statistics v14