Re: O_DIRECT for WAL writes - Mailing list pgsql-patches
From | Mary Edie Meredith |
---|---|
Subject | Re: O_DIRECT for WAL writes |
Date | |
Msg-id | 1117670894.2922.339.camel@localhost Whole thread Raw |
In response to | Re: O_DIRECT for WAL writes (Neil Conway <neilc@samurai.com>) |
Responses |
Re: O_DIRECT for WAL writes
|
List | pgsql-patches |
On Mon, 2005-05-30 at 16:29 +1000, Neil Conway wrote: > On Mon, 2005-05-30 at 10:59 +0900, ITAGAKI Takahiro wrote: > > Yes, I've tested pgbench and dbt2 and their performances have improved. > > The two results are as follows: > > > > 1. pgbench -s 100 on one Pentium4, 1GB mem, 2 ATA disks, Linux 2.6.8 > > (attached image) > > tps | wal_sync_method > > -------+------------------------------------------------------- > > 147.0 | open_direct + write multipage (previous patch) > > 147.2 | open_direct (this patch) > > 109.9 | open_sync > > I'm surprised this makes as much of a difference as that benchmark would > suggest. I wonder if we're benchmarking the right thing, though: is > opening a file with O_DIRECT sufficient to ensure that a write(2) does > not return until the data has hit disk? (As would be the case with > O_SYNC.) O_DIRECT means the OS will attempt to minimize caching, but > that is not necessarily the same thing: for example, I can imagine an > implementation in which the kernel would submit the appropriate I/O to > the disk when it sees a write(2) on a file opened with O_DIRECT, but > then let the write(2) return before getting confirmation from the disk > that the I/O has succeeded or failed. From googling, the MySQL > documentation for innodb_flush_method notes: > > This option is only relevant on Unix systems. If set to > fdatasync, InnoDB uses fsync() to flush both the data and log > files. If set to O_DSYNC, InnoDB uses O_SYNC to open and flush > the log files, but uses fsync() to flush the datafiles. If > O_DIRECT is specified (available on some GNU/Linux versions > starting from MySQL 4.0.14), InnoDB uses O_DIRECT to open the > datafiles, and uses fsync() to flush both the data and log > files. > > That would suggest O_DIRECT by itself is not sufficient to force a flush > to disk -- if anyone has some more definitive evidence that would be > welcome. I know I'm late to this discussion, and I haven't made it all the way through this thread to see if your questions on Linux writes were resolved. If you are still interested, I recommend read a very good one page description of reliable writes buried in the Data Center Linux Goals and Capabilities document. It is on page 159 of the document, the item is "R.ReliableWrites" in this _giant PDF file (do a wget and open it locally ; don't try to read it directly): http://www.osdlab.org/lab_activities/data_center_linux/DCL_Goals_Capabilities_1.1.pdf The information came from me interviewing Daniel McNeil, an OSDL Engineer who wrote and tested much of the Linux async IO code, after I was similarly confused about when a write is "guaranteed". Reliable writes, as you can imagine, are very important to Data Center folks, which is how it happens to be in this document. Hope this helps. > > Anyway, if the above is true, we'll need to use O_DIRECT as well as one > of the existing wal_sync_methods. > > BTW, from the patch: > > + /* TODO: Aligment depends on OS and filesystem. */ > + #define O_DIRECT_BUFFER_ALIGN 4096 > > I suppose there's no reasonable way to autodetect this, so we'll need to > expose it as a GUC variable (or perhaps a configure option), which is a > bit unfortunate. > > -Neil > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Mary Edie Meredith maryedie@osdl.org 503-906-1942 Data Center Linux Initiative Manager Open Source Development Labs
pgsql-patches by date: