On Tue, Jan 14, 2014 at 02:26:25AM +0100, Andres Freund wrote: > On 2014-01-13 17:13:51 -0800, James Bottomley wrote: > > a file into a user provided buffer, thus obtaining a page cache entry > > and a copy in their userspace buffer, then insert the page of the user > > buffer back into the page cache as the page cache page ... that's right, > > isn't it postgress people? > > Pretty much, yes. We'd probably hint (*advise(DONTNEED)) that the page > isn't needed anymore when reading. And we'd normally write if the page > is dirty.
So why, exactly, do you even need the kernel page cache here?
We don't need it, but it would be nice.
You've got direct access to the copy of data read into userspace, and you want direct control of when and how the data in that buffer is written and reclaimed. Why push that data buffer back into the kernel and then have to add all sorts of kernel interfaces to control the page you already have control of?
Say 25% of the RAM is dedicated to the database's shared buffers, and 75% is left to the kernel's judgement. It sure would be nice if the kernel had the capability of using some of that 75% for database pages, if it thought that that was the best use for it.
Which is what we do get now, at the expense of quite a lot of double buffering (by which I mean, a lot of pages are both in the kernel cache and the database cache--not just transiently during the copy process, but for quite a while). If we had the ability to re-inject clean pages into the kernel's cache, we would get that benefit without the double buffering.