> > On win32 (which started this discussion, fsync will sync the
directory
> > entry as well, which will lead to *at least* two seeks on the disk.
> > Writing two blocks after each other to an O_SYNC opened file should
give
> > exactly two seeks.
>
> I think you are making the following not maintainable assumptions.
> 1. there is no other outstanding IO on that drive that the OS happily
> inserts between your two 8k writes
> 2. the rotational delay is neglectible
> 3. the per call overhead is neglectible
>
> You will at least wait until the heads reach the write position again,
> since you will not be able to supply the next 8k in time for the drive
to
> continue writing (with the single backend large tx I was referring
to).
>
> If you doubt what I am saying do dd blocksize tests on a raw device.
> The results are, that up to ~256kb blocksize you can increase the
drive
> performance on a drive that does not have a powerfailsafe cache, and
> does not lie about write success.
On win32 with standard hardware, WAL O_SYNC gives about 2-3x performance
according to pg_bench. This is in part because fsync() on win32 is the
'nuclear option', syncing meta data which slows down things
considerably. Not sure about unix, but the win32 O_DIRECT equivalent
disables the read cache and also gives slightly faster write performance
(presumably from removing the overhead of the cache manager).
The other issue is high performance RAID controllers. With dedicated
memory and processor, a good raid controller w/bbu might perform
significantly better with everything sent right to the controller, all
the time. On win32, fsync() bypasses the raid write cache killing the
performance gain from moving to a caching RAID controller.
Merlin