Thread: a question about Direct I/O and double buffering
Hi, A page may be double buffered in PG's buffer pool and in OS's buffer cache. Other DBMS like DB2 and Oracle has provided Direct I/O option to eliminate double buffering. I noticed there were discusses on the list. But I can not find similar option in PG. Does PG support direct I/O now? The tuning guide of PG usually recommends a small shared buffer pool (compared to the size of physical memory). I think it is to avoid swapping. If there were swapping, OS kernel may swap out some pages in PG's buffer pool even PG want to keep them in memory. i.e. PG would loose full control over buffer pool. A large buffer pool is not good because it may 1. cause more pages double buffered, and thus decrease the efficiency of buffer cache and buffer pool. 2. may cause swapping. Am I right? If PG's buffer pool is small compared with physical memory, can I say that the hit ratio of PG's buffer pool is not so meaningful because most misses can be satisfied by OS Kernel's buffer cache? Thanks! Xiaoning
On Apr 5, 2007, at 12:09 PM, Xiaoning Ding wrote:
Hi,A page may be double buffered in PG's buffer pool and in OS's buffer cache.Other DBMS like DB2 and Oracle has provided Direct I/O option to eliminatedouble buffering. I noticed there were discusses on the list. ButI can not find similar option in PG. Does PG support direct I/O now?The tuning guide of PG usually recommends a small shared buffer pool(comparedto the size of physical memory). I think it is to avoid swapping. Ifthere wereswapping, OS kernel may swap out some pages in PG's buffer pool even PGwant to keep them in memory. i.e. PG would loose full control overbuffer pool.A large buffer pool is not good because it may1. cause more pages double buffered, and thus decrease the efficiency ofbuffercache and buffer pool.2. may cause swapping.Am I right?If PG's buffer pool is small compared with physical memory, can I saythat thehit ratio of PG's buffer pool is not so meaningful because most missescan besatisfied by OS Kernel's buffer cache?Thanks!
To the best of my knowledge, Postgres itself does not have a direct IO option (although it would be a good addition). So, in order to use direct IO with postgres you'll need to consult your filesystem docs for how to set the forcedirectio mount option. I believe it can be set dynamically, but if you want it to be permanent you'll to add it to your fstab/vfstab file.
erik jones <erik@myemma.com>
software developer
615-296-0838
emma(r)
Erik Jones wrote: > On Apr 5, 2007, at 12:09 PM, Xiaoning Ding wrote: > >> Hi, >> >> A page may be double buffered in PG's buffer pool and in OS's buffer >> cache. >> Other DBMS like DB2 and Oracle has provided Direct I/O option to eliminate >> double buffering. I noticed there were discusses on the list. But >> I can not find similar option in PG. Does PG support direct I/O now? >> >> The tuning guide of PG usually recommends a small shared buffer pool >> (compared >> to the size of physical memory). I think it is to avoid swapping. If >> there were >> swapping, OS kernel may swap out some pages in PG's buffer pool even PG >> want to keep them in memory. i.e. PG would loose full control over >> buffer pool. >> A large buffer pool is not good because it may >> 1. cause more pages double buffered, and thus decrease the efficiency of >> buffer >> cache and buffer pool. >> 2. may cause swapping. >> Am I right? >> >> If PG's buffer pool is small compared with physical memory, can I say >> that the >> hit ratio of PG's buffer pool is not so meaningful because most misses >> can be >> satisfied by OS Kernel's buffer cache? >> >> Thanks! > > To the best of my knowledge, Postgres itself does not have a direct IO > option (although it would be a good addition). So, in order to use > direct IO with postgres you'll need to consult your filesystem docs for > how to set the forcedirectio mount option. I believe it can be set > dynamically, but if you want it to be permanent you'll to add it to your > fstab/vfstab file. I use Linux. It supports direct I/O on a per-file basis only. To bypass OS buffer cache, files should be opened with O_DIRECT option. I afraid that I have to modify PG. Xiaoning > > erik jones <erik@myemma.com <mailto:erik@myemma.com>> > software developer > 615-296-0838 > emma(r) > > >
Not to hijack this thread, but has anybody here tested the behavior of PG on a file system with OS-level caching disabled via forcedirectio or by using an inherently non-caching file system such as ocfs2? I've been thinking about trying this setup to avoid double-caching now that the 8.x series scales shared buffers better, but I figured I'd ask first if anybody here had experience with similar configurations. -- Mark On Thu, 2007-04-05 at 13:09 -0500, Erik Jones wrote: > On Apr 5, 2007, at 12:09 PM, Xiaoning Ding wrote: > > > Hi, > > > > > > A page may be double buffered in PG's buffer pool and in OS's buffer > > cache. > > Other DBMS like DB2 and Oracle has provided Direct I/O option to > > eliminate > > double buffering. I noticed there were discusses on the list. But > > I can not find similar option in PG. Does PG support direct I/O now? > > > > > > The tuning guide of PG usually recommends a small shared buffer pool > > (compared > > to the size of physical memory). I think it is to avoid swapping. > > If > > there were > > swapping, OS kernel may swap out some pages in PG's buffer pool even > > PG > > want to keep them in memory. i.e. PG would loose full control over > > buffer pool. > > A large buffer pool is not good because it may > > 1. cause more pages double buffered, and thus decrease the > > efficiency of > > buffer > > cache and buffer pool. > > 2. may cause swapping. > > Am I right? > > > > > > If PG's buffer pool is small compared with physical memory, can I > > say > > that the > > hit ratio of PG's buffer pool is not so meaningful because most > > misses > > can be > > satisfied by OS Kernel's buffer cache? > > > > > > Thanks! > > > To the best of my knowledge, Postgres itself does not have a direct IO > option (although it would be a good addition). So, in order to use > direct IO with postgres you'll need to consult your filesystem docs > for how to set the forcedirectio mount option. I believe it can be > set dynamically, but if you want it to be permanent you'll to add it > to your fstab/vfstab file. > > > erik jones <erik@myemma.com> > software developer > 615-296-0838 > emma(r) > > > > >
On Apr 5, 2007, at 1:22 PM, Xiaoning Ding wrote:
Erik Jones wrote:On Apr 5, 2007, at 12:09 PM, Xiaoning Ding wrote:Hi,A page may be double buffered in PG's buffer pool and in OS's buffer cache.Other DBMS like DB2 and Oracle has provided Direct I/O option to eliminatedouble buffering. I noticed there were discusses on the list. ButI can not find similar option in PG. Does PG support direct I/O now?The tuning guide of PG usually recommends a small shared buffer pool(comparedto the size of physical memory). I think it is to avoid swapping. Ifthere wereswapping, OS kernel may swap out some pages in PG's buffer pool even PGwant to keep them in memory. i.e. PG would loose full control overbuffer pool.A large buffer pool is not good because it may1. cause more pages double buffered, and thus decrease the efficiency ofbuffercache and buffer pool.2. may cause swapping.Am I right?If PG's buffer pool is small compared with physical memory, can I saythat thehit ratio of PG's buffer pool is not so meaningful because most missescan besatisfied by OS Kernel's buffer cache?Thanks!To the best of my knowledge, Postgres itself does not have a direct IO option (although it would be a good addition). So, in order to use direct IO with postgres you'll need to consult your filesystem docs for how to set the forcedirectio mount option. I believe it can be set dynamically, but if you want it to be permanent you'll to add it to your fstab/vfstab file.I use Linux. It supports direct I/O on a per-file basis only. To bypass OS buffer cache,files should be opened with O_DIRECT option. I afraid that I have to modify PG.Xiaoning
Looks like it. I just did a cursory search of the archives and it seems that others have looked at this before so you'll probably want to start there if your up to it.
erik jones <erik@myemma.com>
software developer
615-296-0838
emma(r)
On 4/5/07, Erik Jones <erik@myemma.com> wrote: > > On Apr 5, 2007, at 1:22 PM, Xiaoning Ding wrote: > > Erik Jones wrote: > On Apr 5, 2007, at 12:09 PM, Xiaoning Ding wrote: > Hi, > > A page may be double buffered in PG's buffer pool and in OS's buffer cache. > Other DBMS like DB2 and Oracle has provided Direct I/O option to eliminate > double buffering. I noticed there were discusses on the list. But > I can not find similar option in PG. Does PG support direct I/O now? > > The tuning guide of PG usually recommends a small shared buffer pool > (compared > to the size of physical memory). I think it is to avoid swapping. If > there were > swapping, OS kernel may swap out some pages in PG's buffer pool even PG > want to keep them in memory. i.e. PG would loose full control over > buffer pool. > A large buffer pool is not good because it may > 1. cause more pages double buffered, and thus decrease the efficiency of > buffer > cache and buffer pool. > 2. may cause swapping. > Am I right? > > If PG's buffer pool is small compared with physical memory, can I say > that the > hit ratio of PG's buffer pool is not so meaningful because most misses > can be > satisfied by OS Kernel's buffer cache? > > Thanks! > To the best of my knowledge, Postgres itself does not have a direct IO > option (although it would be a good addition). So, in order to use direct > IO with postgres you'll need to consult your filesystem docs for how to set > the forcedirectio mount option. I believe it can be set dynamically, but if > you want it to be permanent you'll to add it to your fstab/vfstab file. > > I use Linux. It supports direct I/O on a per-file basis only. To bypass OS > buffer cache, > files should be opened with O_DIRECT option. I afraid that I have to modify > PG. > > Xiaoning > Looks like it. I just did a cursory search of the archives and it seems > that others have looked at this before so you'll probably want to start > there if your up to it. > Linux used to have (still does?) a RAW interface which might also be useful. I think the original code was contributed by oracle so they could support direct IO. Alex
On Apr 5, 2007, at 1:27 PM, Mark Lewis wrote:
On Thu, 2007-04-05 at 13:09 -0500, Erik Jones wrote:On Apr 5, 2007, at 12:09 PM, Xiaoning Ding wrote:Hi,A page may be double buffered in PG's buffer pool and in OS's buffercache.Other DBMS like DB2 and Oracle has provided Direct I/O option toeliminatedouble buffering. I noticed there were discusses on the list. ButI can not find similar option in PG. Does PG support direct I/O now?The tuning guide of PG usually recommends a small shared buffer pool(comparedto the size of physical memory). I think it is to avoid swapping.Ifthere wereswapping, OS kernel may swap out some pages in PG's buffer pool evenPGwant to keep them in memory. i.e. PG would loose full control overbuffer pool.A large buffer pool is not good because it may1. cause more pages double buffered, and thus decrease theefficiency ofbuffercache and buffer pool.2. may cause swapping.Am I right?If PG's buffer pool is small compared with physical memory, can Isaythat thehit ratio of PG's buffer pool is not so meaningful because mostmissescan besatisfied by OS Kernel's buffer cache?Thanks!To the best of my knowledge, Postgres itself does not have a direct IOoption (although it would be a good addition). So, in order to usedirect IO with postgres you'll need to consult your filesystem docsfor how to set the forcedirectio mount option. I believe it can beset dynamically, but if you want it to be permanent you'll to add itto your fstab/vfstab file.
Not to hijack this thread, but has anybody here tested the behavior ofPG on a file system with OS-level caching disabled via forcedirectio orby using an inherently non-caching file system such as ocfs2?I've been thinking about trying this setup to avoid double-caching nowthat the 8.x series scales shared buffers better, but I figured I'd askfirst if anybody here had experience with similar configurations.-- Mark
erik jones <erik@myemma.com>
software developer
615-296-0838
emma(r)
... [snipped for brevity] ... > > > Not to hijack this thread, but has anybody here tested the behavior > > of > > PG on a file system with OS-level caching disabled via forcedirectio > > or > > by using an inherently non-caching file system such as ocfs2? > > > > > > I've been thinking about trying this setup to avoid double-caching > > now > > that the 8.x series scales shared buffers better, but I figured I'd > > ask > > first if anybody here had experience with similar configurations. > > > > > > -- Mark > > > Rather than repeat everything that was said just last week, I'll point > out that we just had a pretty decent discusson on this last week that > I started, so check the archives. In summary though, if you have a > high io transaction load with a db where the average size of your > "working set" of data doesn't fit in memory with room to spare, then > direct io can be a huge plus, otherwise you probably won't see much of > a difference. I have yet to hear of anybody actually seeing any > degradation in the db performance from it. In addition, while it > doesn't bother me, I'd watch the top posting as some people get pretty > religious about (I moved your comments down). I saw the thread, but my understanding from reading through it was that you never fully tracked down the cause of the factor of 10 write volume mismatch, so I pretty much wrote it off as a data point for forcedirectio because of the unknowns. Did you ever figure out the cause of that? -- Mark Lewis
On Apr 5, 2007, at 2:56 PM, Mark Lewis wrote:
...[snipped for brevity]...Not to hijack this thread, but has anybody here tested the behaviorofPG on a file system with OS-level caching disabled via forcedirectioorby using an inherently non-caching file system such as ocfs2?I've been thinking about trying this setup to avoid double-cachingnowthat the 8.x series scales shared buffers better, but I figured I'daskfirst if anybody here had experience with similar configurations.-- MarkRather than repeat everything that was said just last week, I'll pointout that we just had a pretty decent discusson on this last week thatI started, so check the archives. In summary though, if you have ahigh io transaction load with a db where the average size of your"working set" of data doesn't fit in memory with room to spare, thendirect io can be a huge plus, otherwise you probably won't see much ofa difference. I have yet to hear of anybody actually seeing anydegradation in the db performance from it. In addition, while itdoesn't bother me, I'd watch the top posting as some people get prettyreligious about (I moved your comments down).I saw the thread, but my understanding from reading through it was thatyou never fully tracked down the cause of the factor of 10 write volumemismatch, so I pretty much wrote it off as a data point forforcedirectio because of the unknowns. Did you ever figure out thecause of that?-- Mark Lewis
Nope. What we never tracked down was the factor of 10 drop in database transactions, not disk transactions. The write volume was most definitely due to the direct io setting -- writes are now being done in terms of the system's block size where as before they were being done in terms of the the filesystem's cache page size (as it's in virtual memory). Basically, we do so many write transactions that the fs cache was constantly paging.
erik jones <erik@myemma.com>
software developer
615-296-0838
emma(r)
On Thu, 5 Apr 2007, Xiaoning Ding wrote: >> >> To the best of my knowledge, Postgres itself does not have a direct IO >> option (although it would be a good addition). So, in order to use direct >> IO with postgres you'll need to consult your filesystem docs for how to >> set the forcedirectio mount option. I believe it can be set dynamically, >> but if you want it to be permanent you'll to add it to your fstab/vfstab >> file. > > I use Linux. It supports direct I/O on a per-file basis only. To bypass OS > buffer cache, > files should be opened with O_DIRECT option. I afraid that I have to modify > PG. as someone who has been reading the linux-kernel mailing list for 10 years, let me comment on this a bit. linux does have a direct i/o option, but it has significant limits on when and how you cna use it (buffers must be 512byte aligned and multiples of 512 bytes, things like that). Also, in many cases testing has shon that there is a fairly significant performance hit for this, not a perfomance gain. what I think that postgres really needs is to add support for write barriers (telling the OS to make shure that everything before the barrier is written to disk before anything after the barrier) I beleive that these are avaiable on SCSI drives, and on some SATA drives. this sort of support, along with appropriate async I/O support (which is probably going to end up being the 'syslets' or 'threadlets' stuff that's in the early experimental stage, rather then the current aio API) has the potential to be a noticable improvement. if you haven't followed the syslets discussion on the kernel list, threadlets are an approach that basicly lets you turn any syscall into a async interface (if the call doesn't block on anything you get the answer back immediatly, if it does block it gets turned into a async call by the kernel) syslets are a way to combine multiple syscalls into a single call, avoiding the user->system->user calling overhead for the additional calls. (it's also viewed as a way to do prototyping of possible new calls, if a sequence of syscalls end up being common enough the kernel devs will look at makeing a new, combined, syscall (for example lock, write, unlock could be made into one if it's common enough and there's enough of a performance gain) David Lang
Alex Deucher wrote: > On 4/5/07, Erik Jones <erik@myemma.com> wrote: >> >> On Apr 5, 2007, at 1:22 PM, Xiaoning Ding wrote: >> >> Erik Jones wrote: >> On Apr 5, 2007, at 12:09 PM, Xiaoning Ding wrote: >> Hi, >> >> A page may be double buffered in PG's buffer pool and in OS's buffer >> cache. >> Other DBMS like DB2 and Oracle has provided Direct I/O option to >> eliminate >> double buffering. I noticed there were discusses on the list. But >> I can not find similar option in PG. Does PG support direct I/O now? >> >> The tuning guide of PG usually recommends a small shared buffer pool >> (compared >> to the size of physical memory). I think it is to avoid swapping. If >> there were >> swapping, OS kernel may swap out some pages in PG's buffer pool even PG >> want to keep them in memory. i.e. PG would loose full control over >> buffer pool. >> A large buffer pool is not good because it may >> 1. cause more pages double buffered, and thus decrease the efficiency of >> buffer >> cache and buffer pool. >> 2. may cause swapping. >> Am I right? >> >> If PG's buffer pool is small compared with physical memory, can I say >> that the >> hit ratio of PG's buffer pool is not so meaningful because most misses >> can be >> satisfied by OS Kernel's buffer cache? >> >> Thanks! >> To the best of my knowledge, Postgres itself does not have a direct IO >> option (although it would be a good addition). So, in order to use >> direct >> IO with postgres you'll need to consult your filesystem docs for how >> to set >> the forcedirectio mount option. I believe it can be set dynamically, >> but if >> you want it to be permanent you'll to add it to your fstab/vfstab file. >> >> I use Linux. It supports direct I/O on a per-file basis only. To >> bypass OS >> buffer cache, >> files should be opened with O_DIRECT option. I afraid that I have to >> modify >> PG. >> >> Xiaoning >> Looks like it. I just did a cursory search of the archives and it seems >> that others have looked at this before so you'll probably want to start >> there if your up to it. >> > > Linux used to have (still does?) a RAW interface which might also be > useful. I think the original code was contributed by oracle so they > could support direct IO. > > Alex I am more concerned with reads , and how to do direct I/O under Linux here. Reading raw devices in linux bypasses OS buffer cache. But how can you mount a raw device( it is a character device) as a file system? Xiaoning
On 4/5/07, Xiaoning Ding <dingxn@cse.ohio-state.edu> wrote: > Alex Deucher wrote: > > On 4/5/07, Erik Jones <erik@myemma.com> wrote: > >> > >> On Apr 5, 2007, at 1:22 PM, Xiaoning Ding wrote: > >> > >> Erik Jones wrote: > >> On Apr 5, 2007, at 12:09 PM, Xiaoning Ding wrote: > >> Hi, > >> > >> A page may be double buffered in PG's buffer pool and in OS's buffer > >> cache. > >> Other DBMS like DB2 and Oracle has provided Direct I/O option to > >> eliminate > >> double buffering. I noticed there were discusses on the list. But > >> I can not find similar option in PG. Does PG support direct I/O now? > >> > >> The tuning guide of PG usually recommends a small shared buffer pool > >> (compared > >> to the size of physical memory). I think it is to avoid swapping. If > >> there were > >> swapping, OS kernel may swap out some pages in PG's buffer pool even PG > >> want to keep them in memory. i.e. PG would loose full control over > >> buffer pool. > >> A large buffer pool is not good because it may > >> 1. cause more pages double buffered, and thus decrease the efficiency of > >> buffer > >> cache and buffer pool. > >> 2. may cause swapping. > >> Am I right? > >> > >> If PG's buffer pool is small compared with physical memory, can I say > >> that the > >> hit ratio of PG's buffer pool is not so meaningful because most misses > >> can be > >> satisfied by OS Kernel's buffer cache? > >> > >> Thanks! > >> To the best of my knowledge, Postgres itself does not have a direct IO > >> option (although it would be a good addition). So, in order to use > >> direct > >> IO with postgres you'll need to consult your filesystem docs for how > >> to set > >> the forcedirectio mount option. I believe it can be set dynamically, > >> but if > >> you want it to be permanent you'll to add it to your fstab/vfstab file. > >> > >> I use Linux. It supports direct I/O on a per-file basis only. To > >> bypass OS > >> buffer cache, > >> files should be opened with O_DIRECT option. I afraid that I have to > >> modify > >> PG. > >> > >> Xiaoning > >> Looks like it. I just did a cursory search of the archives and it seems > >> that others have looked at this before so you'll probably want to start > >> there if your up to it. > >> > > > > Linux used to have (still does?) a RAW interface which might also be > > useful. I think the original code was contributed by oracle so they > > could support direct IO. > > > > Alex > I am more concerned with reads , and how to do direct I/O under Linux here. > Reading raw devices in linux bypasses OS buffer cache. But how can you > mount a raw device( it is a character device) as a file system? > In this case, I guess you'd probably have to do it within pg itself. Alex
On Apr 5, 2007, at 3:33 PM, david@lang.hm wrote:
On Thu, 5 Apr 2007, Xiaoning Ding wrote:To the best of my knowledge, Postgres itself does not have a direct IOoption (although it would be a good addition). So, in order to use directIO with postgres you'll need to consult your filesystem docs for how toset the forcedirectio mount option. I believe it can be set dynamically,but if you want it to be permanent you'll to add it to your fstab/vfstabfile.I use Linux. It supports direct I/O on a per-file basis only. To bypass OS buffer cache,files should be opened with O_DIRECT option. I afraid that I have to modify PG.as someone who has been reading the linux-kernel mailing list for 10 years, let me comment on this a bit.linux does have a direct i/o option,
but it has significant limits on when and how you cna use it (buffers must be 512byte aligned and multiples of 512 bytes, things like that).
Also, in many cases testing has shon that there is a fairly significant performance hit for this, not a perfomance gain.
erik jones <erik@myemma.com>
software developer
615-296-0838
emma(r)
On Thu, 5 Apr 2007, Xiaoning Ding wrote: >> > Xiaoning >> > Looks like it. I just did a cursory search of the archives and it seems >> > that others have looked at this before so you'll probably want to start >> > there if your up to it. >> > >> >> Linux used to have (still does?) a RAW interface which might also be >> useful. I think the original code was contributed by oracle so they >> could support direct IO. >> >> Alex > I am more concerned with reads , and how to do direct I/O under Linux here. > Reading raw devices in linux bypasses OS buffer cache. it also bypassed OS readahead, not nessasarily a win > But how can you > mount a raw device( it is a character device) as a file system? you can do a makefs on /dev/hda just like you do on /dev/hda2 and then mount the result as a filesystem. Postgres wants the OS layer to provide the filesystem, Oracle implements it's own filesystem, so you would just point it at the drive/partition and it would do it's own 'formatting' this is something that may be reasonable for postgres to consider doing someday, since postgres allocates things into 1m files and then keeps track of what filename is used for what, it could instead allocate things in 1m (or whatever size) chunks on the disk, and just keep track of what addresses are used for what instead of filenames. this would definantly allow you to work around problems like the ext2/3 indirect lookup problems. now that the ability for partitioned table spaces it would be an interesting experiment to be able to define a tablespace that used a raw device instead of a filesystem to see if there are any noticable performance gains David Lang
On Thu, 5 Apr 2007, Erik Jones wrote: > On Apr 5, 2007, at 3:33 PM, david@lang.hm wrote: > >> On Thu, 5 Apr 2007, Xiaoning Ding wrote: >> >> > > >> > > To the best of my knowledge, Postgres itself does not have a direct IO >> > > option (although it would be a good addition). So, in order to use >> > > direct >> > > IO with postgres you'll need to consult your filesystem docs for how >> > > to >> > > set the forcedirectio mount option. I believe it can be set >> > > dynamically, >> > > but if you want it to be permanent you'll to add it to your >> > > fstab/vfstab >> > > file. >> > >> > I use Linux. It supports direct I/O on a per-file basis only. To bypass >> > OS buffer cache, >> > files should be opened with O_DIRECT option. I afraid that I have to >> > modify PG. >> >> as someone who has been reading the linux-kernel mailing list for 10 years, >> let me comment on this a bit. >> >> linux does have a direct i/o option, > > Yes, I know applications can request direct i/o with the O_DIRECT flag to > open(), but can this be set to be forced for all applications or for > individual applications from "outside" the application (not that I've ever > heard of something like the second)? no it can't, due to the fact that direct i/o has additional requirements for what you can user for buffers that don't apply to normal i/o >> but it has significant limits on when and how you cna use it (buffers must >> be 512byte aligned and multiples of 512 bytes, things like that). > > That's a standard limit imposed by the sector size of hard drives, and is > present in all direct i/o implementations, not just Linux. right, but you don't have those limits for normal i/o >> Also, in many cases testing has shon that there is a fairly significant >> performance hit for this, not a perfomance gain. > > Those performance hits have been noticed for high i/o transaction databases? > The idea here is that these kinds of database manage their own caches and > having a separate filesystem cache in virtual memory that works with system > memory page sizes is an unneeded level of indirection. ahh, you're proposing a re-think of how postgres interacts with the O/S, not just an optimization to be applied to the current architecture. unlike Oracle, Postgres doesn't try to be an OS itself, it tries very hard to rely on the OS to properly implement things rather then doing it's own implementation. > Yes, you should > expect other "normal" utilities will suffer a performance hit as if you are > trying to cp a 500 byte file you'll still have to work with 8K writes and > reads whereas with the filesystem cache you can just write/read part of a > page in memory and let the cache decide when it needs to write and read from > disk. If there are other caveats to direct i/o on Linux I'd love to hear > them. other then bad interactions with "normal" utilities not compiled for driect i/o I don't remember them offhand. David Lang
On Thu, Apr 05, 2007 at 03:10:43PM -0500, Erik Jones wrote: > Nope. What we never tracked down was the factor of 10 drop in > database transactions, not disk transactions. The write volume was > most definitely due to the direct io setting -- writes are now being > done in terms of the system's block size where as before they were > being done in terms of the the filesystem's cache page size (as it's > in virtual memory). Basically, we do so many write transactions that > the fs cache was constantly paging. Did you try decreasing the size of the cache pages? I didn't realize that Solaris used a different size for cache pages and filesystem blocks. Perhaps the OS was also being too aggressive with read-aheads? My concern is that you're essentially leaving a lot of your memory unused this way, since shared_buffers is only set to 1.6G. BTW, did you ever increase the parameter that controls how much memory Solaris will use for filesystem caching? -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)