Re: O_DIRECT in freebsd - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: O_DIRECT in freebsd |
Date | |
Msg-id | 200306222215.h5MMFqR26835@candle.pha.pa.us Whole thread Raw |
In response to | Re: O_DIRECT in freebsd (Sean Chittenden <sean@chittenden.org>) |
Responses |
Re: O_DIRECT in freebsd
Re: O_DIRECT in freebsd |
List | pgsql-hackers |
What you really want is Solaris's free-behind, where it detects if a scan is exceeding a certain percentage of the OS cache and moves the pages to the _front_ of the to-be-reused list. I am not sure what other OS's support this, but we need this on our own buffer manager code as well. Our TODO already has: * Add free-behind capability for large sequential scans (Bruce) Basically, I think we need free-behind rather than O_DIRECT. --------------------------------------------------------------------------- Sean Chittenden wrote: > > > > > The reason I mention it is that Postgres already supports > > > > > O_DIRECT I think on some other platforms (for whatever > > > > > reason). > > > > > > > > [ sounds of grepping... ] No. The only occurrence of O_DIRECT in the > > > > source tree is in TODO: > > > > > > > > * Consider use of open/fcntl(O_DIRECT) to minimize OS caching > > > > > > > > I personally disagree with this TODO item for the same reason > > > > Sean cited: Postgres is designed and tuned to rely on OS-level > > > > disk caching, and bypassing that cache is far more likely to > > > > hurt our performance than help it. > > > > > > DB2 and Oracle, from memory, allow users to pass hints to the > > > planner to use/not use file system caching. This could be useful > > > if you had an application retrieving a large amount of data on an > > > adhoc basis. The large retrieval would empty out the disk cache > > > there by negatively impacting upon other applications operating on > > > data which should be cached. > > > > I've recently been bitten by this. On DB2, I could change what > > bufferpool the large tables were using and set it fairly small, but > > obviously not an option with PGSQL. But, if pgsql could stop caching > > from occuring on user-specified queries, large table or index scans, > > etc., it would be very helpful. > > Actually, now that I think about this, if the planner is going to read > more than X number of bytes as specified in a GUC, it would be useful > to have the fd marked as O_DIRECT to avoid polluting the disk > cache... I have a few tables with about 300M rows (~9GB on disk) that > I have to perform nightly seq scans over for reports and it does wipe > out some of the other fast movers that come through and depend on the > disk cache to be there for their speed. Because they're performed in > the middle of the night, I don't care that much, but my avg query > times during that period of time are slower... whether it's load or > the disk buffer being emptied and having to be refilled, I'm not sure, > but thinking about it, use of a GUC threshold to have an FD marked as > O_DIRECT does make sense (0 == disabled and the default, but tunable > in Kbytes as an admin sees fit) and could be nice for big queries that > have lots of smaller queries running around at the same time. > > -sc > > -- > Sean Chittenden > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
pgsql-hackers by date: