Thread: O_DIRECT use
I have added this item to TODO: * Consider use of open/fctl(O_DIRECT) to minimize OS caching Web shows it minimized file system caching, perhaps for sequential scans: http://archives2.us.postgresql.org/pgsql-hackers/2001-09/msg00713.php -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I have added this item to TODO: > > * Consider use of open/fctl(O_DIRECT) to minimize OS caching > > Why exactly would we wish to minimize OS caching? > > In my mind, Postgres has always relied heavily on the existence of a > layer of kernel caching. Disabling that will hurt far more than help. Not sure. Someone on IRC brought it up. If we are sequential scanning a large table, caching may be bad because we are pushing out stuff already in the cache that may be useful. It is related to this TODO item: * Add free-behind capability for large sequential scans (Bruce) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I have added this item to TODO: > * Consider use of open/fctl(O_DIRECT) to minimize OS caching Why exactly would we wish to minimize OS caching? In my mind, Postgres has always relied heavily on the existence of a layer of kernel caching. Disabling that will hurt far more than help. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> Why exactly would we wish to minimize OS caching? > > > Not sure. Someone on IRC brought it up. If we are sequential scanning a > > large table, caching may be bad because we are pushing out stuff already > > in the cache that may be useful. > > Yeah, but people normally try to set things up to avoid doing large > sequential scans, at least in all the contexts where they need high > performance. For index searches you definitely want all the caching > you can get. > > For that matter, I would expect that O_DIRECT also defeats readahead, > so I'd fully expect it to be a loser for seqscans too. I am told on FreeBSD it does not disable read-ahead, just caching; something that needs more research. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> For that matter, I would expect that O_DIRECT also defeats readahead, >> so I'd fully expect it to be a loser for seqscans too. > I am told on FreeBSD it does not disable read-ahead, just caching; > something that needs more research. Hmm. I always thought of read-ahead as preloading buffer cache entries. It'd be interesting to get a description of *exactly* what this flag does, rather than handwavy approximations. Time to start reading the kernel code, I suppose. regards, tom lane
[2002-01-04 16:31] Bruce Momjian said: | Not sure. Someone on IRC brought it up. Is there a pg IRC channel? What is the server? cheers. brent -- "Develop your talent, man, and leave the world something. Records are really gifts from people. To think that an artist would love you enough to share his music with anyone is a beautiful thing." -- Duane Allman
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> Why exactly would we wish to minimize OS caching? > Not sure. Someone on IRC brought it up. If we are sequential scanning a > large table, caching may be bad because we are pushing out stuff already > in the cache that may be useful. Yeah, but people normally try to set things up to avoid doing large sequential scans, at least in all the contexts where they need high performance. For index searches you definitely want all the caching you can get. For that matter, I would expect that O_DIRECT also defeats readahead, so I'd fully expect it to be a loser for seqscans too. regards, tom lane
Brent Verner wrote: > [2002-01-04 16:31] Bruce Momjian said: > > | Not sure. Someone on IRC brought it up. > > Is there a pg IRC channel? What is the server? FAQ item text is: <P>There is also an IRC channel on EFNet, channel <I>#PostgreSQL.</I> I use the unix command <CODE>irc -c '#PostgreSQL'"$USER" irc.phoenix.net.</CODE></P> -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Brent Verner wrote: > [2002-01-04 16:31] Bruce Momjian said: > > | Not sure. Someone on IRC brought it up. > > Is there a pg IRC channel? What is the server? See FAQ item 1.6. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> For that matter, I would expect that O_DIRECT also defeats readahead, > >> so I'd fully expect it to be a loser for seqscans too. > > > I am told on FreeBSD it does not disable read-ahead, just caching; > > something that needs more research. > > Hmm. I always thought of read-ahead as preloading buffer cache entries. > > It'd be interesting to get a description of *exactly* what this flag > does, rather than handwavy approximations. Time to start reading the > kernel code, I suppose. I found this before adding the item: http://www.pairlist.net/pipermail/flow-tools/2001-October/000058.html And this for FreeBSD 4.4: 2.1 Kernel Changes The O_DIRECT flag has been added to open(2) and fcntl(2). Specifying this flag for open files will attempt to minimizethe cache effects of reading and writing. I also found: http://www.ukuug.org/events/linux2001/papers/html/AArcangeli-o_direct.html These later ones seem to indicate there isn't read-ahead, meaning we would have to do our own prefetches. Eck. I am unclear if that is true on all OS's. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Fri, 4 Jan 2002, Bruce Momjian wrote: > > >> For that matter, I would expect that O_DIRECT also defeats readahead, > > >> so I'd fully expect it to be a loser for seqscans too. > And this for FreeBSD 4.4: > The O_DIRECT flag has been added to open(2) and fcntl(2). Specifying this > flag for open files will attempt to minimize the cache effects of reading > and writing. This seems rather vague. Can any FreeBSD person here say whether the semantics are any stronger? > http://www.ukuug.org/events/linux2001/papers/html/AArcangeli-o_direct.html > > These later ones seem to indicate there isn't read-ahead, meaning we > would have to do our own prefetches. Eck. I am unclear if that is > true on all OS's. The Linux O_DIRECT semantics are intended to be harder. In essence, the kernel _will not cache_ data read from or written to such a file or device. The point of this, incidentally, was to be able to run things like Oracle Parallel Server and other shared- disk setups. It's use as an "I don't need this cached" mechanism is secondary, and rather sub-optimal, as seen here; you disable software read-ahead and introduce coherence issues with non-O_DIRECT openers of the file. (I'm not sure of the precise Linux semantics of this, but it's probably fair to say that you may as well consider them undefined.) Linux 2.4 has "madvise", but unfortunately no matching "fadvise". A quick Google implied that FreeBSD is in the same boat. Matthew.