> >> One point that I no longer recall the reasoning behind is that xlog.c
> >> doesn't think O_SYNC is a preferable default over fsync.
> >
> >For larger (>8k) transactions O_SYNC|O_DIRECT is only good with the recent
> >pending patch to group WAL writes together. The fsync method gives the OS a
> >chance to do the grouping. (Of course it does not matter if you have small
> >tx < 8k WAL)
>
> This would be true for fdatasync() but not for fsync(), I think.
No, it is only worse with fsync, since that adds a mandatory seek.
> On win32 (which started this discussion, fsync will sync the directory
> entry as well, which will lead to *at least* two seeks on the disk.
> Writing two blocks after each other to an O_SYNC opened file should give
> exactly two seeks.
I think you are making the following not maintainable assumptions.
1. there is no other outstanding IO on that drive that the OS happily inserts between your two 8k writes
2. the rotational delay is neglectible
3. the per call overhead is neglectible
You will at least wait until the heads reach the write position again,
since you will not be able to supply the next 8k in time for the drive to
continue writing (with the single backend large tx I was referring to).
If you doubt what I am saying do dd blocksize tests on a raw device.
The results are, that up to ~256kb blocksize you can increase the drive
performance on a drive that does not have a powerfailsafe cache, and
does not lie about write success.
Andreas