On 1/13/14, 4:44 PM, Andres Freund wrote:
>>> > > One major usecase is transplanting a page comming from postgres'
>>> > >buffers into the kernel's buffercache because the latter has a much
>>> > >better chance of properly allocating system resources across independent
>>> > >applications running.
>> >
>> >If you want to share pages between the application and the page cache,
>> >the only known interface is mmap ... perhaps we can discuss how better
>> >to improve mmap for you?
> I think purely using mmap() is pretty unlikely to work out - there's
> just too many constraints about when a page is allowed to be written out
> (e.g. it's interlocked with postgres' write ahead log). I also think
> that for many practical purposes using mmap() would result in an absurd
> number of mappings or mapping way too huge areas; e.g. large btree
> indexes are usually accessed in a quite fragmented manner.
Which brings up another interesting area^Wcan-of-worms: the database is implementing journaling on top of a filesystem
that'sprobably also journaling. And it's going to get worse: a Segate researcher presented at RICon East last year that
thenext generation (or maybe the one after that) of spinning rust will use "shingling", which means that the drive
can'twrite randomly. So now the drive will ALSO have to journal. And of course SSDs already do this.
So now there's *three* pieces of software all doing the exact same thing, none of which are able to coordinate with
eachother.
--
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net