Re: O_DIRECT for WAL writes - Mailing list pgsql-patches

From Mary Edie Meredith
Subject Re: O_DIRECT for WAL writes
Date
Msg-id 1117670894.2922.339.camel@localhost
Whole thread Raw
In response to Re: O_DIRECT for WAL writes  (Neil Conway <neilc@samurai.com>)
Responses Re: O_DIRECT for WAL writes  (Neil Conway <neilc@samurai.com>)
List pgsql-patches
On Mon, 2005-05-30 at 16:29 +1000, Neil Conway wrote:
> On Mon, 2005-05-30 at 10:59 +0900, ITAGAKI Takahiro wrote:
> > Yes, I've tested pgbench and dbt2 and their performances have improved.
> > The two results are as follows:
> >
> > 1. pgbench -s 100 on one Pentium4, 1GB mem, 2 ATA disks, Linux 2.6.8
> >    (attached image)
> >   tps  | wal_sync_method
> > -------+-------------------------------------------------------
> >  147.0 | open_direct + write multipage (previous patch)
> >  147.2 | open_direct (this patch)
> >  109.9 | open_sync
>
> I'm surprised this makes as much of a difference as that benchmark would
> suggest. I wonder if we're benchmarking the right thing, though: is
> opening a file with O_DIRECT sufficient to ensure that a write(2) does
> not return until the data has hit disk? (As would be the case with
> O_SYNC.) O_DIRECT means the OS will attempt to minimize caching, but
> that is not necessarily the same thing: for example, I can imagine an
> implementation in which the kernel would submit the appropriate I/O to
> the disk when it sees a write(2) on a file opened with O_DIRECT, but
> then let the write(2) return before getting confirmation from the disk
> that the I/O has succeeded or failed. From googling, the MySQL
> documentation for innodb_flush_method notes:
>
>         This option is only relevant on Unix systems. If set to
>         fdatasync, InnoDB uses fsync() to flush both the data and log
>         files. If set to O_DSYNC, InnoDB uses O_SYNC to open and flush
>         the log files, but uses fsync() to flush the datafiles. If
>         O_DIRECT is specified (available on some GNU/Linux versions
>         starting from MySQL 4.0.14), InnoDB uses O_DIRECT to open the
>         datafiles, and uses fsync() to flush both the data and log
>         files.
>
> That would suggest O_DIRECT by itself is not sufficient to force a flush
> to disk -- if anyone has some more definitive evidence that would be
> welcome.

I know I'm late to this discussion, and I haven't made it all the way
through this thread to see if your questions on Linux writes were
resolved.   If you are still interested, I recommend read a very good
one page description of reliable writes buried in the Data Center Linux
Goals and Capabilities document.  It is on page 159 of the document, the
item is "R.ReliableWrites" in this _giant PDF file (do a wget and open
it locally ;  don't try to read it directly):

http://www.osdlab.org/lab_activities/data_center_linux/DCL_Goals_Capabilities_1.1.pdf

The information came from me interviewing Daniel McNeil, an OSDL
Engineer who wrote and tested much of the Linux async IO code, after I
was similarly confused about when a write is "guaranteed".   Reliable
writes, as you can imagine, are very important to Data Center folks,
which is how it happens to be in this document.

Hope this helps.
>
> Anyway, if the above is true, we'll need to use O_DIRECT as well as one
> of the existing wal_sync_methods.
>
> BTW, from the patch:
>
> + /* TODO: Aligment depends on OS and filesystem. */
> + #define O_DIRECT_BUFFER_ALIGN    4096
>
> I suppose there's no reasonable way to autodetect this, so we'll need to
> expose it as a GUC variable (or perhaps a configure option), which is a
> bit unfortunate.
>
> -Neil
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
--
Mary Edie Meredith
maryedie@osdl.org
503-906-1942
Data Center Linux Initiative Manager
Open Source Development Labs


pgsql-patches by date:

Previous
From: "Alon Goldshuv"
Date:
Subject: COPY fast parse patch
Next
From: Christopher Kings-Lynne
Date:
Subject: Re: patch for between symmetric, asymmetric (from TODO)