On Wed, Sep 18, 2013 at 2:02 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Lonni J Friedman <netllama@gmail.com> wrote:
>
>> top shows over 90% of the load is in sys space. vmstat output
>> seems to suggest that its CPU bound (or bouncing back & forth):
>
> Can you run `perf top` during an episode and see what kernel
> functions are using all that CPU?
I take back what I said earlier. While the master is currently back
to normal performance, the two hot standby slaves are still churning
something awful.
If I run 'perf top' on either slave, after a few seconds, these are
consistently the top three in the list:
84.57% [kernel] [k] _spin_lock_irqsave
6.21% [unknown] [.] 0x0000000000659f60
4.69% [kernel] [k] compaction_alloc
>
> This looks similar to cases I've seen of THP defrag going wild.
> Did the OS version or configuration change? Did the PostgreSQL
> memory settings (like shared_buffers) change?
I think you're onto something here with respect to THP defrag going
wild. I set /sys/kernel/mm/transparent_hugepage/defrag to 'never' and
immediately the load dropped on both slaves from over 5.00 to under
1.00.
So this raises the question, is this a kernel bug, or is there some
other solution to the problem?
Also, seems weird that the problem didn't happen until I switched from
9.2 to 9.3. Is it possible this is somehow related to the change from
using SysV shared memory to using Posix shared memory and mmap for
memory management?