Re: checkpoint writeback via sync_file_range - Mailing list pgsql-hackers

From Andres Freund
Subject Re: checkpoint writeback via sync_file_range
Date
Msg-id 201201111346.30167.andres@anarazel.de
Whole thread Raw
In response to checkpoint writeback via sync_file_range  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: checkpoint writeback via sync_file_range  (Greg Smith <greg@2ndQuadrant.com>)
List pgsql-hackers
On Wednesday, January 11, 2012 03:14:31 AM Robert Haas wrote:
> Greg Smith muttered a while ago about wanting to do something with
> sync_file_range to improve checkpoint behavior on Linux.  I thought he
> was talking about trying to sync only the range of blocks known to be
> dirty, which didn't seem like a very exciting idea, but after looking
> at the man page for sync_file_range, I think I understand what he was
> really going for: sync_file_range allows you to hint the Linux kernel
> that you'd like it to clean a certain set of pages.  I further recall
> from Greg's previous comments that in the scenarios he's seen,
> checkpoint I/O spikes are caused not so much by the data written out
> by the checkpoint itself but from the other dirty data in the kernel
> buffer cache.  Based on that, I whipped up the attached patch, which,
> if sync_file_range is available, simply iterates through everything
> that will eventually be fsync'd before beginning the write phase and
> tells the Linux kernel to put them all under write-out.
I played around with this before and my problem was that sync_file_range is not 
really a hint. It actually starts writeback *directly* and only returns when 
the io is placed inside the queue (at least thats the way it was back then). 
Which very quickly leads to it blocking all the time...

Andres


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: checkpoint writeback via sync_file_range
Next
From: Andres Freund
Date:
Subject: Re: checkpoint writeback via sync_file_range