On 2014-01-13 12:34:35 -0800, James Bottomley wrote:
> On Mon, 2014-01-13 at 14:32 -0600, Jim Nasby wrote:
> > Well, if we were to collaborate with the kernel community on this then
> > presumably we can do better than that for eviction... even to the
> > extent of "here's some data from this range in this file. It's (clean|
> > dirty). Put it in your cache. Just trust me on this."
>
> This should be the madvise() interface (with MADV_WILLNEED and
> MADV_DONTNEED) is there something in that interface that is
> insufficient?
For one, postgres doesn't use mmap for files (and can't without major
new interfaces). Frequently mmap()/madvise()/munmap()ing 8kb chunks has
horrible consequences for performance/scalability - very quickly you
contend on locks in the kernel.
Also, that will mark that page dirty, which isn't what we want in this
case. One major usecase is transplanting a page comming from postgres'
buffers into the kernel's buffercache because the latter has a much
better chance of properly allocating system resources across independent
applications running.
Oh, and the kernel's page-cache management while far from perfect,
actually scales much better than postgres'.
Greetings,
Andres Freund
-- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services