Re: [PERFORM] Direct I/O issues - Mailing list pgsql-patches
From | Bruce Momjian |
---|---|
Subject | Re: [PERFORM] Direct I/O issues |
Date | |
Msg-id | 200611231641.kANGfae01113@momjian.us Whole thread Raw |
Responses |
Re: [PERFORM] Direct I/O issues
|
List | pgsql-patches |
I have applied your test_fsync patch for 8.2. Thanks. --------------------------------------------------------------------------- Greg Smith wrote: > I've been trying to optimize a Linux system where benchmarking suggests > large performance differences between the various wal_sync_method options > (with o_sync being the big winner). I started that by using > src/tools/fsync/test_fsync to get an idea what I was dealing with (and to > spot which drives had write caching turned on). Since those results > didn't match what I was seeing in the benchmarks, I've been browsing the > backend source to figure out why. I noticed test_fsync appears to be, > ahem, out of sync with what the engine is doing. > > It looks like V8.1 introduced O_DIRECT writes to the WAL, determined at > compile time by a series of preprocessor tests in > src/backend/access/transam/xlog.c When O_DIRECT is available, > O_SYNC/O_FSYNC/O_DSYNC writes use it. test_fsync doesn't do that. > > I moved the new code (in 8.2 beta 3, lines 61-92 in xlog.c) into > test_fsync; all the flags had the same name so it dropped right in. You > can get the version I made at http://www.westnet.com/~gsmith/test_fsync.c > (fixed a compiler warning, too) > > The results I get now look fishy. I'm not sure if I screwed up a step, or > if I'm seeing a real problem. The system here is running RedHat Linux, > RHEL ES 4.0 kernel 2.6.9, and the disk I'm writing to is a standard > 7200RPM IDE drive. I turned off write caching with hdparm -W 0 > > Here's an excerpt from the stock test_fsync: > > Compare one o_sync write to two: > one 16k o_sync write 8.717944 > two 8k o_sync writes 17.501980 > > Compare file sync methods with 2 8k writes: > (o_dsync unavailable) > open o_sync, write 17.018495 > write, fdatasync 8.842473 > write, fsync, 8.809117 > > And here's the version I tried to modify to include O_DIRECT support: > > Compare one o_sync write to two: > one 16k o_sync write 0.004995 > two 8k o_sync writes 0.003027 > > Compare file sync methods with 2 8k writes: > (o_dsync unavailable) > open o_sync, write 0.004978 > write, fdatasync 8.845498 > write, fsync, 8.834037 > > Obivously the o_sync writes aren't waiting for the disk. Is this a > problem with O_DIRECT under Linux? Or is my code just not correctly > testing this behavior? > > Just as a sanity check, I did try this on another system, running SuSE > with drives connected to a cciss SCSI device, and I got exactly the same > results. I'm concerned that Linux users who use O_SYNC because they > notice it's faster will be losing their WAL integrity without being aware > of the problem, especially as the whole O_DIRECT business isn't even > mentioned in the WAL documentation--it really deserves to be brought up in > the wal_sync_method notes at > http://developer.postgresql.org/pgdocs/postgres/runtime-config-wal.html > > And while I'm mentioning improvements to that particular documentation > page...the wal_buffers notes there are so sparse they misled me initially. > They suggest only bumping it up for situations with very large > transactions; since I was testing with small ones I left it woefully > undersized initially. I would suggest copying the text from > http://developer.postgresql.org/pgdocs/postgres/wal-configuration.html to > here: "When full_page_writes is set and the system is very busy, setting > this value higher will help smooth response times during the period > immediately following each checkpoint." That seems to match what I found > in testing. > > -- > * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + *** /pg/tools/fsync/test_fsync.c Fri Oct 13 10:18:33 2006 --- test_fsync.c Thu Nov 23 00:24:49 2006 *************** *** 14,19 **** --- 14,20 ---- #include <time.h> #include <sys/time.h> #include <unistd.h> + #include <string.h> #ifdef WIN32 #define FSYNC_FILENAME "./test_fsync.out" *************** *** 21,40 **** #define FSYNC_FILENAME "/var/tmp/test_fsync.out" #endif ! /* O_SYNC and O_FSYNC are the same */ #if defined(O_SYNC) ! #define OPEN_SYNC_FLAG O_SYNC #elif defined(O_FSYNC) ! #define OPEN_SYNC_FLAG O_FSYNC ! #elif defined(O_DSYNC) ! #define OPEN_DATASYNC_FLAG O_DSYNC #endif #if defined(OPEN_SYNC_FLAG) ! #if defined(O_DSYNC) && (O_DSYNC != OPEN_SYNC_FLAG) ! #define OPEN_DATASYNC_FLAG O_DSYNC #endif #endif #define WAL_FILE_SIZE (16 * 1024 * 1024) --- 22,54 ---- #define FSYNC_FILENAME "/var/tmp/test_fsync.out" #endif ! /* This logic comes from src/backend/access/transam/xlog.c where it's ! better documented */ ! #ifdef O_DIRECT ! #define PG_O_DIRECT O_DIRECT ! #else ! #define PG_O_DIRECT 0 ! #endif ! #if defined(O_SYNC) ! #define BARE_OPEN_SYNC_FLAG O_SYNC #elif defined(O_FSYNC) ! #define BARE_OPEN_SYNC_FLAG O_FSYNC ! #endif ! #ifdef BARE_OPEN_SYNC_FLAG ! #define OPEN_SYNC_FLAG (BARE_OPEN_SYNC_FLAG | PG_O_DIRECT) #endif + #if defined(O_DSYNC) #if defined(OPEN_SYNC_FLAG) ! #if O_DSYNC != BARE_OPEN_SYNC_FLAG ! #define OPEN_DATASYNC_FLAG (O_DSYNC | PG_O_DIRECT) ! #endif ! #else ! #define OPEN_DATASYNC_FLAG (O_DSYNC | PG_O_DIRECT) #endif #endif + #define WAL_FILE_SIZE (16 * 1024 * 1024)
pgsql-patches by date: