Re: O_DIRECT in freebsd - Mailing list pgsql-hackers

From Sean Chittenden
Subject Re: O_DIRECT in freebsd
Date
Msg-id 20030623040135.GO97131@perrin.int.nxad.com
Whole thread Raw
In response to Re: O_DIRECT in freebsd  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses pgsql-hackers@postgresql.org  ("Jim C. Nasby" <jim@nasby.net>)
List pgsql-hackers
> >> it doesn't seem totally out of the question.  I'd kinda like to
> >> see some experimental evidence that it's worth doing though.
> >> Anyone care to make a quick-hack prototype and do some
> >> measurements?
> 
> > What would you like to measure?  Overall system performance when a
> > query is using O_DIRECT or are you looking for negative/postitve
> > impact of read() not using the FS cache?  The latter is much
> > easier to do than the former...  recreating a valid load
> > environment that'd let any O_DIRECT benchmark be useful isn't
> > trivial.
> 
> If this stuff were easy, we'd have done it already ;-).

What do you mean?  Bits don't just hit the tree randomly because of a
possible speed improvement hinted at by a man page reference?  :-]

> The first problem is to figure out what makes sense to measure.

Egh, yeah, and this isn't trivial either.... benchmarking around vfs
caching makes it hard to get good results (been down that prim rose
path before with sendfile() happiness).

> Given that the request is for a quick-and-dirty test, I'd be willing
> to cut you some slack on the measurement process.  That is, it's
> okay to pick something easier to measure over something harder to
> measure, as long as you can make a fair argument that what you're
> measuring is of any interest at all...

hrm, well, given the easy part is thumping out the code, how's the
following sound as a test procedure:

1) Write out several files at varying sizes using O_DIRECT (512KB,  1MB, 5MB, 10MB, 50MB, 100MB, 512MB, 1GB) to avoid
havingthe FS  cache polluted by the writes.
 

2) Open two new procs that read the above created files with and  without O_DIRECT (each test iteration must rewrite
thefiles  above).
 

3) Before each read() call (does PostgreSQL use fread(3) or read(2)?),  use gettimeofday(2) to get high resolution
timingof time required  to perform each system call.
 

4) Perform each of the tests above 4 times, averaging the last three  and throwing out the 1st case (though reporting
itsvalue may be of  interest).
 


I'm not that wild about writing anything threaded unless there's
strong enough interest in a write() to an O_DIRECT'ed fd to see what
happens.  I'm not convinced we'll see anything worth while unless I
setup an example that's doing a ton of write disk io.

As things stand, because O_DIRECT is an execution fast path through
the vfs subsystem, I expect the speed difference to be greater on
faster HDDs with high RPMs than on slower IDE machines at only
5400RPM... thus trivializing any benchmark I'll do on my laptop.  And
actually, if the app can't keep up with the disk, I bet the fs cache
case will be faster.  If the read()'s are able to keep up at the rate
of the HDD, however, this could be a big win in the speed dept, but if
things lag for an instant, the platter will have to make another
rotation before the call comes back to the userland.

Now that I think about it, the optimal case would be to anonymously
mmap() a private buffer that does the read() writes into that way the
HDD could just DMA the data into the mmap()'ed buffer making it a
zero-copy read operation.... though stirring any interest with my
mmap() benchmarks from a while back seems to me have been lost in the
fray.  :)

-sc

-- 
Sean Chittenden


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [GENERAL] large objects
Next
From: Tom Lane
Date:
Subject: Re: Two weeks to feature freeze