Re: High SYS CPU - need advise - Mailing list pgsql-general

From Merlin Moncure
Subject Re: High SYS CPU - need advise
Date
Msg-id CAHyXU0y8-XeBY9h1snUvnfuASYMa37md5kpOwSj+=g1t6XQOMw@mail.gmail.com
Whole thread Raw
In response to Re: High SYS CPU - need advise  (John R Pierce <pierce@hogranch.com>)
Responses Re: High SYS CPU - need advise
List pgsql-general
On Mon, Nov 19, 2012 at 10:50 AM, Vlad <marchenko@gmail.com> wrote:
> I just did a little experiment: extracted top four queries that were
> executed the longest during stall times and launched  pgbench test with 240
> clients. Yet I wasn't able to put the server into a stall with that. Also
> load average was hitting 120+, it was all user cpu, single digit % system.
> The system remained pretty responsive (on a keypress), in contrary to when
> high-sys-cpu stall happens, showing similar LA numbers.
>
> This makes me think that there is probably a different (than ones I tried)
> query or condition that is responsible for creating high-sys-cpu. Also, from
> my limited knowledge of postgresql and kernel internals, this experiment
> puts under question idea that linux scheduler might be the corner stone....
>
> Here is a longer log, first column is PID. High-sys-cpu stall happened
> around 10:42:37
> https://dl.dropbox.com/u/109778/postgresql-2012-11-19_103425-stripped.log
>
> I'll try to put pgbouncer in place and repeat, also though this will take
> longer time to setup.


yeah.  interesting -- contention was much higher this time and that
changes things.  strange how it was missed earlier.

you're getting bounced around a lot in lwlock especially
(unfortunately we don't know which one).  I'm going to hazard another
guess:  maybe the trigger here is when the number of contending
backends exceeds some critical number (probably based on the number of
cores) you see a quick cpu spike (causing more backends to lock and
pile up) as cache line bouncing sets in.  That spike doesn't last
long, because the spinlocks quickly accumulate delay counts then punt
to the scheduler which is unable to cope.  The exact reason why this
is happening to you in exactly this way (I've never seen it) is
unclear.  Also the line between symptom and cause is difficult to
draw.

unfortunately, in your case spinlock re-scheduling isn't helping.  log
entries like this one:
18764 [2012-11-19 10:43:50.124 CST] LOG:  JJ spin delay from file
sinvaladt.c line 512 delay 212, pointer 0x7f514959a394 at character 29

are suggesting major problems.  you're dangerously close to a stuck
spinlock which is lights out for the database.

merlin


pgsql-general by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: High SYS CPU - need advise
Next
From: Vlad
Date:
Subject: Re: High SYS CPU - need advise