Re: CPU spikes and transactions - Mailing list pgsql-performance

From Jeff Janes
Subject Re: CPU spikes and transactions
Date
Msg-id CAMkU=1xdDOEfGCAOX09DBh3n5r_=j+S9mpqrJnV02x3RZkUdBA@mail.gmail.com
Whole thread Raw
In response to Re: CPU spikes and transactions  (Dave Owens <dave@teamunify.com>)
List pgsql-performance
On Tue, May 13, 2014 at 4:04 PM, Dave Owens <dave@teamunify.com> wrote:
Hi,

Apologies for resurrecting this old thread, but it seems like this is better than starting a new conversation.

We are now running 9.1.13 and have doubled the CPU and memory.  So 2x 16 Opteron 6276 (32 cores total), and 64GB memory.  shared_buffers set to 20G, effective_cache_size set to 40GB.

We were able to record perf data during the latest incident of high CPU utilization. perf report is below:

Samples: 31M of event 'cycles', Event count (approx.): 16289978380877 
 44.74%       postmaster  [kernel.kallsyms]             [k] _spin_lock_irqsave                                     
 15.03%       postmaster  postgres                      [.] 0x00000000002ea937                                     
  3.14%       postmaster  postgres                      [.] s_lock                                                 
  2.30%       postmaster  [kernel.kallsyms]             [k] compaction_alloc                                       
  2.21%       postmaster  postgres                      [.] HeapTupleSatisfiesMVCC                                 


compaction_alloc points to "transparent huge pages" kernel problem, while HeapTupleSatisfiesMVCC points to the problem with each backend taking a ProcArrayLock for every not-yet-committed tuple it encounters.  I don't know which of those leads to the _spin_lock_irqsave.  It seems more likely to be transparent huge pages that does that, but perhaps both of them do.

If it is the former, you can find other message on this list about disabling it.  If it is the latter, your best bet is to commit your bulk inserts as soon as possible (this might be improved for 9.5, if we can figure out how to test the alternatives). Please let us know what works.  

If lowering shared_buffers works, I wonder if disabling the transparent huge page compaction issue might let you bring shared_buffers back up again.  


Cheers,

Jeff

pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: CPU spikes and transactions
Next
From: Craig James
Date:
Subject: Stats collector constant I/O