Re: High SYS CPU - need advise - Mailing list pgsql-general

From Merlin Moncure
Subject Re: High SYS CPU - need advise
Date
Msg-id CAHyXU0wH+nN8g2+K2D0gyZ0tskepMpPjw3kQtXeZzd4yVOj4+A@mail.gmail.com
Whole thread Raw
In response to Re: High SYS CPU - need advise  (Vlad <marchenko@gmail.com>)
List pgsql-general
On Fri, Nov 16, 2012 at 11:19 AM, Vlad <marchenko@gmail.com> wrote:
>
>> We're looking for spikes in 'blk' which represents when lwlocks bump.
>> If you're not seeing any then this is suggesting a buffer pin related
>> issue -- this is also supported by the fact that raising shared
>> buffers didn't help.   If you're not seeing 'bk's, go ahead and
>> disable the stats macro.
>
>
> most blk comes with 0, some with 1, few hitting 100. I can't say that during
> stall times the number of blk 0 vs blk non-0 are very different.

right.  this is feeling more and more like a buffer pin issue.  but
even then we can't be certain -- it could be symptom, not the cause.
to prove it we need to demonstrate that everyone is spinning and
waiting, which we haven't done.  classic spinlock contention manifests
in high user cpu. we are binding in kernel, so I wonder if it's all
the select() calls.  we haven't yet ruled out kernel regression.

If I were you, I'd be investigating pgbouncer to see if your app is
compliant with transaction mode processing, if for no other reason
than it will mitigate high load issues.

>> *) How many specific query plans are needed to introduce the
>> condition,  Hopefully, it's not too many.  If so, let's start
>> gathering the plans.  If you have a lot of plans to sift through, one
>> thing we can attempt to eliminate noise is to tweak
>> log_min_duration_statement so that during stall times (only) it logs
>> offending queries that are unexpectedly blocking.
>
>
> unfortunately, there are quite a few query plans... also, I don't think
> setting log_min_duration_statement will help us, cause when server is
> hitting high load average, it reacts slowly even on a key press. So even
> non-offending queries will be taking long to execute. I see all sorts of
> queries a being executed long during stall: spanning from simple
> LOG:  duration: 1131.041 ms  statement: SELECT 'DBD::Pg ping test'
> to complex ones, joining multiple tables.
> We are still looking into all the logged queries in attempt to find the ones
> that are causing the problem, I'll report if we find any clues.

right.

merlin


pgsql-general by date:

Previous
From: "Welty, Richard"
Date:
Subject: Re: Experiences with pl/Java
Next
From: Merlin Moncure
Date:
Subject: Re: High SYS CPU - need advise