Re: fsync or fdatasync - Mailing list pgsql-admin

From Ragnar Kjørstad
Subject Re: fsync or fdatasync
Date
Msg-id 20020910224830.A30625@vestdata.no
Whole thread Raw
In response to Re: fsync or fdatasync  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: fsync or fdatasync  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-admin
On Tue, Sep 10, 2002 at 03:17:00PM -0400, Tom Lane wrote:
> =?iso-8859-1?Q?Ragnar_Kj=F8rstad?= <postgres@ragnark.vestdata.no> writes:
> > On Tue, Sep 10, 2002 at 11:40:24AM -0400, Bruce Momjian wrote:
> >> We use fdatasync where available, and fsync when it is not.
>
> > Makes sense.
>
> >> We also use O_SYNC on open if it is available.
>
> s/also/instead/ ...

Yes, I understood that...

> open_datasync is the first choice if available.

I assume open_datasync means open with O_SYNC flag..

> > Why? That will slow tings down...
>
> On what evidence do you assert that?
>
> In theory open_datasync can be the fastest alternative for WAL writing,
> because it should cause the kernel to force each WAL write() request
> down to disk immediately.  fdatasync will result in the same amount of
> I/O, but it will also require the kernel to scan its disk cache to see
> if there are any other dirty blocks that need to be written.  On many
> kernels this check is not very efficient and can chew substantial
> amounts of CPU time.

Yes, I see your argument.
However, I've just checked the linux-implementation of fsync() and I
can't really see how it could chew substantial amounts of CPU time. The
way it works every inode has a list of dirty data buffers - all it does
it traverse that list and do a write on each.

Anyway - I'm sure this is not enough to convince you, so I'll have to
set up a test instead. But not tonight.


> The tradeoff is that open_datasync syncs each WAL
> block individually, which is unnecessary if you are committing
> multiple blocks worth of WAL entries at once --- but there's no hard
> evidence that that slows things down, especially not when the WAL logs
> are on their own disk spindle.

Well, in theory fsync() will allow the disk to reorder the writes, and
that should give significantly better performance, because it will
reduce the required number of seeks. If the WAL is on a seperate spindel
there will very few seeks in the first place, so there is less to gain,
but for the case with the WAL on the same disk as something else there
is probably some gain. But it makes sense to optimize for the
WAL-on-seperate-disk case...

Another advantage is that fsync() would allow the elevator to merge
multiple IO-requests. Still the same number of bytes to write, but fewer
bigger requests are typicly faster.

But again; numbers speek. I'll get back to you once I find the time to
test it.


> Check the pghackers archives (a year or two back) for lots and lots of
> discussion, but I recall we demonstrated that the current default
> choices are reasonable for at least some set of Unixen.  If you've got
> more information showing that the present default is wrong on some
> kernel, let's have it ... but don't waste our time with blanket
> assertions that "X is the right (or wrong) choice", because we know
> that's not so across all the platforms we support.  We'd not have
> bothered with four sync methods if there weren't good evidence that each
> is the best available choice on some platforms.

No argument there; I'm sure there are applications for all of them.
My point is that I think fdatasync() would be the fastest choice for the
linux kernel.



--
Ragnar Kjørstad

pgsql-admin by date:

Previous
From: Tom Lane
Date:
Subject: Re: Vacuum analyze infos
Next
From: Bruce Momjian
Date:
Subject: Re: fsync or fdatasync