On 1/14/14, 6:36 PM, Claudio Freire wrote:
> On Tue, Jan 14, 2014 at 9:22 PM, Jim Nasby <jim@nasby.net> wrote:
>> On 1/14/14, 11:30 AM, Jeff Janes wrote:
>>>
>>> I think the "reclaim this page if you need memory but leave it resident if
>>> there is no memory pressure" hint would be more useful for temporary working
>>> files than for what was being discussed above (shared buffers). When I do
>>> work that needs large temporary files, I often see physical write IO spike
>>> but physical read IO does not. I interpret that to mean that the temporary
>>> data is being written to disk to satisfy either dirty_expire_centisecs or
>>> dirty_*bytes, but the data remains in the FS cache and so disk reads are not
>>> needed to satisfy it. So a hint that says "this file will never be fsynced
>>> so please ignore dirty_*bytes and dirty_expire_centisecs. I will need it
>>> again relatively soon (but not after a reboot), but will do so mostly
>>> sequentially, so please don't evict this without need, but if you do need to
>>> then it is a good candidate" would be good.
>>
>>
>> I also frequently see this, and it has an even larger impact if pgsql_tmp is
>> on the same filesystem as WAL. Which *theoretically* shouldn't matter with a
>> BBU controller, except that when the kernel suddenly decides your
>> *temporary* data needs to hit the media you're screwed.
>>
>> Though, it also occurs to me... perhaps it would be better for us to simply
>> map temp objects to memory and let the kernel swap them out if needed...
>
>
> Oum... bad idea.
>
> Swap logic has very poor taste for I/O patterns.
Well, to be honest, so do we. Practically zero in fact...
In fact, the kernel might even be in a better position than we are since you can presumably count page faults much more
cheaplythan we can.
BTW, if you guys are looking at ARC you should absolutely read discussion about that in our archives
(http://lnk.nu/postgresql.org/2zeu/as a starting point). We put considerable effort into it, had it in two minor
versions,and then switched to a clock-sweep algorithm that's similar to what FreeBSD used, at least in the 4.x days.
Definitelynot claiming what we've got is the best (in fact, I think we're hurt by not maintaining a real free list),
butthe ARC info there is probably valuable.
--
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net