Re: Spread checkpoint sync - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Spread checkpoint sync
Date
Msg-id AANLkTi=is1XMzPNVHnee-97wANFtp2WW1QC5Q0rJ8qjx@mail.gmail.com
Whole thread Raw
In response to Re: Spread checkpoint sync  (Greg Smith <greg@2ndquadrant.com>)
Responses Re: Spread checkpoint sync  (Greg Smith <greg@2ndquadrant.com>)
Re: Spread checkpoint sync  (Greg Smith <greg@2ndquadrant.com>)
List pgsql-hackers
On Thu, Jan 27, 2011 at 12:18 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Greg Smith wrote:
>>
>> I think a helpful next step here would be to put Robert's fsync compaction
>> patch into here and see if that helps.  There are enough backend syncs
>> showing up in the difficult workloads (scale>=1000, clients >=32) that its
>> impact should be obvious.
>
> Initial tests show everything expected from this change and more.  This took
> me a while to isolate because of issues where the filesystem involved
> degraded over time, giving a heavy bias toward a faster first test run,
> before anything was fragmented.  I just had to do a whole new mkfs on the
> database/xlog disks when switching between test sets in order to eliminate
> that.
>
> At a scale of 500, I see the following average behavior:
>
> Clients TPS backend-fsync
> 16 557 155
> 32 587 572
> 64 628 843
> 128 621 1442
> 256 632 2504
>
> On one run through with the fsync compaction patch applied this turned into:
>
> Clients TPS backend-fsync
> 16 637 0
> 32 621 0
> 64 721 0
> 128 716 0
> 256 841 0
>
> So not only are all the backend fsyncs gone, there is a very clear TPS
> improvement too.  The change in results at >=64 clients are well above the
> usual noise threshold in these tests.
> The problem where individual fsync calls during checkpoints can take a long
> time is not appreciably better.  But I think this will greatly reduce the
> odds of running into the truly dysfunctional breakdown, where checkpoint and
> backend fsync calls compete with one another, that caused the worst-case
> situation kicking off this whole line of research here.

Dude!  That's pretty cool.  Thanks for doing that measurement work -
that's really awesome.

Barring objections, I'll go ahead and commit my patch.

Based on what I saw looking at this, I'm thinking that the backend
fsyncs probably happen in clusters - IOW, it's not 2504 backend fsyncs
spread uniformly throughout the test, but clusters of 100 or more that
happen in very quick succession, followed by relief when the
background writer gets around to emptying the queue.  During each
cluster, the system probably slows way down, and then recovers when
the queue is emptied.  So the TPS improvement isn't at all a uniform
speedup, but simply relief from the stall that would otherwise result
from a full queue.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Caution when removing git branches
Next
From: Robert Haas
Date:
Subject: Re: Caution when removing git branches