Re: checkpoint patches - Mailing list pgsql-hackers

From Robert Haas
Subject Re: checkpoint patches
Date
Msg-id CA+TgmoYzKnqF66tFwRwgXVN-UUQwu5O6X6rMywX7Ocx1vRRRnA@mail.gmail.com
Whole thread Raw
In response to Re: checkpoint patches  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: checkpoint patches  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Thu, Mar 22, 2012 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> However, looking at this a bit more, I think the
> checkpoint-sync-pause-v1 patch contains an obvious bug - the GUC is
> supposedly represented in seconds (though not marked with GUC_UNIT_S,
> oops) but what the sleep implements is actually *tenths of a second*.
> So I think I'd better rerun these tests with checkpoint_sync_pause=30
> so that I get a three-second delay rather than a
> three-tenths-of-a-second delay between each fsync.

OK, I did that, rerunning the test with just checkpoint-sync-pause-v1
and master, still with scale factor 1000 and 32 clients.  Tests were
run on the two branches in alternation, so checkpoint-sync-pause-v1,
then master, then checkpoint-sync-pause-v1, then master, etc.; with a
new initdb and data load each time.  TPS numbers:

checkpoint-sync-pause-v1: 2594.448538, 2600.231666, 2580.041376
master: 2466.399991, 2450.752843, 2291.613305

90th percentile latency:

checkpoint-sync-pause-v1: 1487, 1488, 1481
master: 1493, 1519, 1507

That's about a 6% increase in throughput and about a 1.3% reduction in
90th-percentile latency.  On the other hand, the two timed checkpoints
on the master branch, on each test run, are exactly 15 minutes apart,
whereas with the patch, they're 15 minutes and 30-40 seconds apart,
which may account for some of the difference.  I'm going to do a bit
more testing to try to isolate that.

I'm attaching a possibly-interesting graph comparing the first
checkpoint-sync-pause-v1 run against the second master run; I chose
that particular combination because those are the runs with the median
tps results.  It's interesting how eerily similar the two runs are to
each other; they have spikes and dips in almost the same places, and
what looks like random variation is apparently not so random after
all.  The graph attached here is based on tps averaged over ten second
intervals.

Thoughts?  Comments?  Ideas?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: COPY / extend ExclusiveLock
Next
From: Stephen Frost
Date:
Subject: Re: checkpoint patches