Greg Smith <greg@2ndquadrant.com> writes:
> So my guess is that some small percentage of Windows users might notice
> a change here, and some testing on FreeBSD would be useful too. That's
> about it for platforms that I think anybody needs to worry about.
To my mind, O_DIRECT is not really the key issue here, it's whether to
prefer O_DSYNC or fdatasync. I looked back in the archives, and I think
that the main reason we prefer O_DSYNC when available is the results
I got here:
http://archives.postgresql.org/pgsql-hackers/2001-03/msg00381.php
which demonstrated a performance benefit on HPUX 10.20, though with a
test tool much more primitive than test_fsync. I still have that
machine, although the disk that was in it at the time died awhile back.
What's in there now is a Seagate ST336607LW spinning at 10000 RPM (166
rev/sec) and today I get numbers like this from test_fsync:
Simple write: 8k write 28331.020/second
Compare file sync methods using one write: open_datasync 8k write 161.190/second open_sync 8k
write 156.478/second 8k write, fdatasync 54.302/second 8k write, fsync
51.810/second
Compare file sync methods using two writes: 2 open_datasync 8k writes 81.702/second 2 open_sync 8k
writes 80.172/second 8k write, 8k write, fdatasync 40.829/second 8k write, 8k write, fsync
39.836/second
Compare open_sync with different sizes: open_sync 16k write 80.192/second 2 open_sync 8k
writes 78.018/second
Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.) 8k write, fsync, close 52.527/second 8k write, close, fsync
54.092/second
So *on that rather ancient platform* there's a measurable performance
benefit to O_DSYNC, but this seems to be largely because fdatasync is
stubbed to fsync in userspace rather than because fdatasync wouldn't
be a better idea in the abstract. Also, a lot of the argument against
fsync at the time was that it forced the kernel to iterate through all
the buffers for the WAL file to see if any were dirty. I would imagine
that modern kernels are a tad smarter about that; and even if they
aren't, the CPU speed versus disk speed tradeoff has changed enough
since 2001 that iterating through 16MB of buffers isn't as interesting
as it was then.
So to my mind, switching to the preference order fdatasync,
fsync_writethrough, fsync seems like the thing to do. Since we assume
fsync is always available, that means that O_DSYNC/O_SYNC will not be
the defaults on any platform.
regards, tom lane