Re: Use of O_DIRECT only for open_* sync options - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Use of O_DIRECT only for open_* sync options
Date
Msg-id 201103111147.p2BBlLN29891@momjian.us
Whole thread Raw
In response to Re: Use of O_DIRECT only for open_* sync options  (Greg Smith <greg@2ndquadrant.com>)
List pgsql-hackers
Greg Smith wrote:
> Bruce Momjian wrote:
> > xlogdefs.h says:
> >
> > /*
> >  *  Because O_DIRECT bypasses the kernel buffers, and because we never
> >  *  read those buffers except during crash recovery, it is a win to use
> >  *  it in all cases where we sync on each write().  We could allow O_DIRECT
> >  *  with fsync(), but because skipping the kernel buffer forces writes out
> >  *  quickly, it seems best just to use it for O_SYNC.  It is hard to imagine
> >  *  how fsync() could be a win for O_DIRECT compared to O_SYNC and O_DIRECT.
> >  *  Also, O_DIRECT is never enough to force data to the drives, it merely
> >  *  tries to bypass the kernel cache, so we still need O_SYNC or fsync().
> >  */
> >
> > This seems wrong because fsync() can win if there are two writes before
> > the sync call.  Can kernels not issue fsync() if the write was O_DIRECT?
> > If that is the cause, we should document it.
> >   
> 
> The comment does look busted, because you did imagine exactly a case 
> where they might be combined.  The only incompatibility that I'm aware 
> of is that O_DIRECT requires reads and writes to be aligned properly, so 
> you can't use it in random application code unless it's aware of that.  
> O_DIRECT and fsync are compatible; for example, MySQL allows combining 
> the two:  http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html

C comment updated in git head:
*  Because O_DIRECT bypasses the kernel buffers, and because we never*  read those buffers except during crash recovery
orif wal_level != minimal,*  it is a win to use it in all cases where we sync on each write().  We could*  allow
O_DIRECTwith fsync(), but it is unclear if fsync() could process*  writes not buffered in the kernel.  Also, O_DIRECT
isnever enough to force*  data to the drives, it merely tries to bypass the kernel cache, so we still*  need
O_SYNC/O_DSYNC.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


pgsql-hackers by date:

Previous
From: Gianni Ciolli
Date:
Subject: maximum digits for NUMERIC
Next
From: Fujii Masao
Date:
Subject: Re: Sync Rep v19