Re: O_DIRECT in freebsd - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: O_DIRECT in freebsd
Date
Msg-id 200306222350.h5MNomr03736@candle.pha.pa.us
Whole thread Raw
In response to Re: O_DIRECT in freebsd  (Sean Chittenden <sean@chittenden.org>)
Responses Re: O_DIRECT in freebsd
List pgsql-hackers
Basically, we don't know when we read a buffer whether this is a
read-only or read/write.  In fact, we could read it in, and another
backend could write it for us.

The big issue is that when we do a write, we don't wait for it to get to
disk.

It seems to use O_DIRECT, we would have to read the buffer in a special
way to mark it as read-only, which seems kind of strange.  I see no
reason we can't use free-behind in the PostgreSQL buffer cache to handle
most of the benefits of O_DIRECT, without the read-only buffer restriction.

---------------------------------------------------------------------------

Sean Chittenden wrote:
> > _That_ is an excellent point.  However, do we know at the time we
> > open the file descriptor if we will be doing this?
> 
> Doesn't matter, it's an option to fcntl().
> 
> > What about cache coherency problems with other backends not opening
> > with O_DIRECT?
> 
> That's a problem for the kernel VM, if you mean cache coherency in the
> VM.  If you mean inside of the backend, that could be a stickier
> issue, I think.  I don't know enough of the internals yet to know if
> this is a problem or not, but you're right, it's certainly something
> to consider.  Is the cache a write behind cache or is it a read
> through cache?  If it's a read through cache, which I think it is,
> then the backend would have to dirty all cache entries pertaining to
> the relations being opened with O_DIRECT.  The use case for that
> being:
> 
> 1) a transaction begins
> 2) a few rows out of the huge table are read
> 3) a huge query is performed that triggers the use of O_DIRECT
> 4) the rows selected in step 2 are updated (this step should poison or
>    update the cache, actually, and act as a write through cache if the
>    data is in the cache)
> 5) the same few rows are read in again
> 6) transaction is committed
> 
> Provided the cache is poisoned or updated in step 4, I can't see how
> or where this would be a problem.  Please enlighten if there's a
> different case that would need to be taken into account.  I can't
> imagine ever wanting to write out data using O_DIRECT and think that
> it's a read only optimization in an attempt to minimize the turn over
> in the OS's cache.  From fcntl(2):
> 
>      O_DIRECT     Minimize or eliminate the cache effects of reading and writ-
>                   ing.  The system will attempt to avoid caching the data you
>                   read or write.  If it cannot avoid caching the data, it will
>                   minimize the impact the data has on the cache.  Use of this
>                   flag can drastically reduce performance if not used with
>                   care.
> 
> 
> > And finally, how do we deal with the fact that writes to O_DIRECT
> > files will wait until the data hits the disk because there is no
> > kernel buffer cache?
> 
> Well, two things.
> 
> 1) O_DIRECT should never be used on writes... I can't think of a case
>    where you'd want it off, even when COPY'ing data and restoring a
>    DB, it just doesn't make sense to use it.  The write buffer is
>    emptied as soon as the pages hit the disk unless something is
>    reading those bits, but I'd imagine the write buffer would be used
>    to make sure that as much writing is done to the platter in a
>    single write by the OS as possible, circumventing that would be
>    insane (though useful possibly for embedded devices with low RAM,
>    solid state drives, or some super nice EMC fiber channel storage
>    device that basically has its own huge disk cache).
> 
> 2) Last I checked PostgreSQL wasn't a threaded app and doesn't use
>    non-blocking IO.  The backend would block until the call returns,
>    where's the problem?  :)
> 
> If anything O_DIRECT would shake out any bugs in PostgreSQL's caching
> code, if there are any.  -sc
> 
> -- 
> Sean Chittenden
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: The Hermit Hacker
Date:
Subject: Re: Two weeks to feature freeze
Next
From: Sean Chittenden
Date:
Subject: Re: O_DIRECT in freebsd