On Tue, Feb 5, 2013 at 11:23 PM, Josh Krupka <jkrupka@gmail.com> wrote:
I've been looking into something on our system that sounds similar to what you're seeing. I'm still researching it, but I'm suspecting the memory compaction that runs as part of transparent huge pages when memory is allocated... yet to be proven. The tunable you mentioned controls the compaction process that runs at allocation time so it can try to allocate large pages, there's a separate one that controls if the compaction is done in khugepaged, and a separate one that controls whether THP is used at all or not (/sys/kernel/mm/transparent_hugepage/enabled, or perhaps different in your distro)
BTW, I sent /defrag yesterday, but /enabled had the same output.
What's the output of this command? egrep 'trans|thp|compact_' /proc/vmstat compact_stall represents the number of processes that were stalled to do a compaction, the other metrics have to do with other parts of THP. If you see compact_stall climbing, from what I can tell those might be causing your spikes. I haven't found a way of telling how long the processes have been stalled. You could probably get a little more insight into the processes with some tracing assuming you can catch it quickly enough. Running perf top will also show the compaction happening but that doesn't necessarily mean it's impacting your running processes.