On Wednesday, January 11, 2012 10:28:11 AM Simon Riggs wrote:
> On Wed, Jan 11, 2012 at 4:38 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> > On 1/10/12 9:14 PM, Robert Haas wrote:
> >> Based on that, I whipped up the attached patch, which,
> >> if sync_file_range is available, simply iterates through everything
> >> that will eventually be fsync'd before beginning the write phase and
> >> tells the Linux kernel to put them all under write-out.
> >
> > I hadn't really thought of using it that way. The kernel expects that
> > when this is called the normal way, you're going to track exactly which
> > segments you want it to sync. And that data isn't really passed through
> > the fsync absorption code yet; the list of things to fsync has already
> > lost that level of detail.
> >
> > What you're doing here doesn't care though, and I hadn't considered that
> > SYNC_FILE_RANGE_WRITE could be used that way on my last pass through its
> > docs. Used this way, it's basically fsync without the wait or guarantee;
> > it just tries to push what's already dirty further ahead of the write
> > queue than those writes would otherwise be.
>
> I don't think this will help at all, I think it will just make things
> worse.
>
> The problem comes from hammering the fsyncs one after the other. What
> this patch does is initiate all of the fsyncs at the same time, so it
> will max out the disks even more because this will hit all disks all
> at once.
The advantage of using sync_file_range that way is that it starts writeout but
doesn't cause queue drains/barriers/whatever to be issued which can be quite a
signfiicant speed gain. In theory.
Andres