> > Basically, I think we need free-behind rather than O_DIRECT.
>
> There are two separate issues here --- one is what's happening in
> our own cache, and one is what's happening in the kernel disk cache.
> Implementing our own free-behind code would help in our own cache
> but does nothing for the kernel cache.
>
> My thought on this is that for large seqscans we could think about
> doing reads through a file descriptor that's opened with O_DIRECT.
> But writes should never go through O_DIRECT. In some scenarios this
> would mean having two FDs open for the same relation file. This'd
> require moderately extensive changes to the smgr-related APIs, but
> it doesn't seem totally out of the question. I'd kinda like to see
> some experimental evidence that it's worth doing though. Anyone
> care to make a quick-hack prototype and do some measurements?
What would you like to measure? Overall system performance when a
query is using O_DIRECT or are you looking for negative/postitve
impact of read() not using the FS cache? The latter is much easier to
do than the former... recreating a valid load environment that'd let
any O_DIRECT benchmark be useful isn't trivial.
-sc
--
Sean Chittenden