Direct I/O issues - Mailing list pgsql-performance
From | Greg Smith |
---|---|
Subject | Direct I/O issues |
Date | |
Msg-id | Pine.GSO.4.64.0611230013550.26031@westnet.com Whole thread Raw |
Responses |
Re: Direct I/O issues
|
List | pgsql-performance |
I've been trying to optimize a Linux system where benchmarking suggests large performance differences between the various wal_sync_method options (with o_sync being the big winner). I started that by using src/tools/fsync/test_fsync to get an idea what I was dealing with (and to spot which drives had write caching turned on). Since those results didn't match what I was seeing in the benchmarks, I've been browsing the backend source to figure out why. I noticed test_fsync appears to be, ahem, out of sync with what the engine is doing. It looks like V8.1 introduced O_DIRECT writes to the WAL, determined at compile time by a series of preprocessor tests in src/backend/access/transam/xlog.c When O_DIRECT is available, O_SYNC/O_FSYNC/O_DSYNC writes use it. test_fsync doesn't do that. I moved the new code (in 8.2 beta 3, lines 61-92 in xlog.c) into test_fsync; all the flags had the same name so it dropped right in. You can get the version I made at http://www.westnet.com/~gsmith/test_fsync.c (fixed a compiler warning, too) The results I get now look fishy. I'm not sure if I screwed up a step, or if I'm seeing a real problem. The system here is running RedHat Linux, RHEL ES 4.0 kernel 2.6.9, and the disk I'm writing to is a standard 7200RPM IDE drive. I turned off write caching with hdparm -W 0 Here's an excerpt from the stock test_fsync: Compare one o_sync write to two: one 16k o_sync write 8.717944 two 8k o_sync writes 17.501980 Compare file sync methods with 2 8k writes: (o_dsync unavailable) open o_sync, write 17.018495 write, fdatasync 8.842473 write, fsync, 8.809117 And here's the version I tried to modify to include O_DIRECT support: Compare one o_sync write to two: one 16k o_sync write 0.004995 two 8k o_sync writes 0.003027 Compare file sync methods with 2 8k writes: (o_dsync unavailable) open o_sync, write 0.004978 write, fdatasync 8.845498 write, fsync, 8.834037 Obivously the o_sync writes aren't waiting for the disk. Is this a problem with O_DIRECT under Linux? Or is my code just not correctly testing this behavior? Just as a sanity check, I did try this on another system, running SuSE with drives connected to a cciss SCSI device, and I got exactly the same results. I'm concerned that Linux users who use O_SYNC because they notice it's faster will be losing their WAL integrity without being aware of the problem, especially as the whole O_DIRECT business isn't even mentioned in the WAL documentation--it really deserves to be brought up in the wal_sync_method notes at http://developer.postgresql.org/pgdocs/postgres/runtime-config-wal.html And while I'm mentioning improvements to that particular documentation page...the wal_buffers notes there are so sparse they misled me initially. They suggest only bumping it up for situations with very large transactions; since I was testing with small ones I left it woefully undersized initially. I would suggest copying the text from http://developer.postgresql.org/pgdocs/postgres/wal-configuration.html to here: "When full_page_writes is set and the system is very busy, setting this value higher will help smooth response times during the period immediately following each checkpoint." That seems to match what I found in testing. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
pgsql-performance by date: