Re: some longer, larger pgbench tests with various performance-related patches - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: some longer, larger pgbench tests with various performance-related patches
Date
Msg-id CA+U5nMJkHbYaeGeV3nk0+pWFPAEgLuWj76sAq7eU_SSjMQJkGg@mail.gmail.com
Whole thread Raw
In response to some longer, larger pgbench tests with various performance-related patches  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: some longer, larger pgbench tests with various performance-related patches
List pgsql-hackers
On Tue, Jan 24, 2012 at 8:53 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> do a single 30-minute pgbench run at scale factor 300 using a variety

Nice

A minor but necessary point: Repeated testing of the Group commit
patch when you have synch commit off is clearly pointless, so
publishing numbers for that without saying very clearly that's what's
happening doesn't really help.

> Graphs are here:
>
> http://wiki.postgresql.org/wiki/Robert_Haas_9.2CF4_Benchmark_Results

Nicer

> There are two graphs for each branch.  The first is a scatter plot of
> latency vs. transaction time.  I found that graph hard to understand,
> though; I couldn't really tell what I was looking at.  So I made a
> second set of graphs which graph number of completed transactions in a
> given second of the test against time.  The results are also included
> on the previous page, below the latency graphs, and I find them much
> more informative.
>
> A couple of things stand out at me from these graphs.  First, some of
> these transactions had really long latency.  Second, there are a
> remarkable number of seconds all of the test during which no
> transactions at all manage to complete, sometimes several seconds in a
> row.  I'm not sure why.  Third, all of the tests initially start of
> processing transactions very quickly, and get slammed down very hard,
> probably because the very high rate of transaction processing early on
> causes a checkpoint to occur around 200 s.

Check

I'm happy that this exposes characteristics I've seen and have been
discussing for a while now.

It would be useful to calculate the slow-down contribution of the
longer txns. What % of total time is given over to slowest 10% of
transactions. Looking at that, I think you'll see why we should care
about sorting out what happens in the worst cases.

>  I didn't actually log when
> the checkpoints were occuring, but it seems like a good guess.  It's
> also interesting to wonder whether the checkpoint I/O itself causes
> the drop-off, or the ensuing full page writes.  Fourth,
> xloginsert-scale-6 helps quite a bit; in fact, it's the only patch
> that actually changes the whole shape of the tps graph.  I'm
> speculating here, but that may be because it blunts the impact of full
> page writes by allowing backends to copy their full page images into
> the write-ahead log in parallel.

I think we should be working to commit XLogInsert and then Group
Commit, then come back to the discussion.

There's no way we're committing other patches but not those, so
everything needs to be viewed with those two in. i.e. commit and then
re-baseline.

So I'd say no more tests just yet, but then lots of testing next week++.

> But I plan to drop
> buffreelistlock-reduction-v1 and freelist-ok-v2 from future test runs
> based on Simon's comments elsewhere.  I'm including the results here
> just because these tests were already running when he made those
> comments.

Yep

One you aren't testing is clog_history, which is designed to work in
conjunction with background_clean_slru. That last one clearly needs a
little tuning though...

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Next
From: Vik Reykja
Date:
Subject: Different error messages executing CREATE TABLE or ALTER TABLE to create a column "xmin"