Thread: O_DIRECT, or madvise and/or posix_fadvise
I caught this thread about O_DIRECT on kerneltrap.org: http://kerneltrap.org/node/7563 It sounds like there is much to be gained here in terms of reducing the number of user/kernel space copies in the operating system. I got the impression that posix_fadvise in the Linux kernel isn't as good as it could be. I noticed in xlog.c that the use of posix_fadvise is disabled. Maybe it's time to do some more experimenting and working with the Linux kernel developers. Or perhaps there is another OS that would be better to experiment with? Not sure where to start but do people think this is worth taking a stab at? Regards, Mark
On Thu, Jan 11, 2007 at 02:35:13PM -0800, markwkm@gmail.com wrote: > I caught this thread about O_DIRECT on kerneltrap.org: > http://kerneltrap.org/node/7563 > > It sounds like there is much to be gained here in terms of reducing > the number of user/kernel space copies in the operating system. I got > the impression that posix_fadvise in the Linux kernel isn't as good as > it could be. I noticed in xlog.c that the use of posix_fadvise is > disabled. Maybe it's time to do some more experimenting and working > with the Linux kernel developers. Or perhaps there is another OS that > would be better to experiment with? Postgres doesn't use O_DIRECT and probably never will. The system is esigned to use the system cache, not bypass it. What recent discussions have highlighted is the need to more accurately control the flow of data to disk. Apparently currently kernel try to hold data back much longer than is useful. Not that I'm volunterring to deal with this. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On 1/12/07, Martijn van Oosterhout <kleptog@svana.org> wrote: > On Thu, Jan 11, 2007 at 02:35:13PM -0800, markwkm@gmail.com wrote: > > I caught this thread about O_DIRECT on kerneltrap.org: > > http://kerneltrap.org/node/7563 > > > > It sounds like there is much to be gained here in terms of reducing > > the number of user/kernel space copies in the operating system. I got > > the impression that posix_fadvise in the Linux kernel isn't as good as > > it could be. I noticed in xlog.c that the use of posix_fadvise is > > disabled. Maybe it's time to do some more experimenting and working > > with the Linux kernel developers. Or perhaps there is another OS that > > would be better to experiment with? > > Postgres doesn't use O_DIRECT and probably never will. The system is > esigned to use the system cache, not bypass it. > > What recent discussions have highlighted is the need to more accurately > control the flow of data to disk. Apparently currently kernel try to > hold data back much longer than is useful. Right, so my understanding is that.PostgreSQL needs to provide the OS with information with how it wants it to control the flow with posix_fadvise, and it sounds like the Linux folks believe their implementation of posix_fadvise needs some work. > Not that I'm volunterring to deal with this. > > Have a nice day, Regards, Mark