Use of O_DIRECT only for open_* sync options - Mailing list pgsql-hackers

From Bruce Momjian
Subject Use of O_DIRECT only for open_* sync options
Date
Msg-id 201101191853.p0JIrEn15002@momjian.us
Whole thread Raw
Responses Re: Use of O_DIRECT only for open_* sync options  (Robert Haas <robertmhaas@gmail.com>)
Re: Use of O_DIRECT only for open_* sync options  (Greg Smith <greg@2ndquadrant.com>)
List pgsql-hackers
Is there a reason we only use O_DIRECT with open_* sync options?
xlogdefs.h says:

/**  Because O_DIRECT bypasses the kernel buffers, and because we never*  read those buffers except during crash
recovery,it is a win to use*  it in all cases where we sync on each write().  We could allow O_DIRECT*  with fsync(),
butbecause skipping the kernel buffer forces writes out*  quickly, it seems best just to use it for O_SYNC.  It is hard
toimagine*  how fsync() could be a win for O_DIRECT compared to O_SYNC and O_DIRECT.*  Also, O_DIRECT is never enough
toforce data to the drives, it merely*  tries to bypass the kernel cache, so we still need O_SYNC or fsync().*/
 

This seems wrong because fsync() can win if there are two writes before
the sync call.  Can kernels not issue fsync() if the write was O_DIRECT?
If that is the cause, we should document it.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
Next
From: "Kevin Grittner"
Date:
Subject: Re: Couple document fixes