Re: Server hitting 100% CPU usage, system comes to a crawl. - Mailing list pgsql-general

From Brian Fehrle
Subject Re: Server hitting 100% CPU usage, system comes to a crawl.
Date
Msg-id 4EA9BBDD.5050109@consistentstate.com
Whole thread Raw
In response to Server hitting 100% CPU usage, system comes to a crawl.  (Brian Fehrle <brianf@consistentstate.com>)
List pgsql-general
Also, I'm not having any issue with the database restarting itself,
simply becoming unresponsive / slow to respond, to the point where just
sshing to the box takes about 30 seconds if not longer. Performing a
pg_ctl restart on the cluster resolves the issue.

I looked through the logs for any segmentation faults, none found. In
fact the only thing in my log that seems to be 'bad' are the following.

Oct 27 08:53:18 <snip> postgres[17517]: [28932839-1]
user=<snip>,db=<snip> ERROR:  deadlock detected
Oct 27 11:49:22 <snip> postgres[608]: [19-1] user=<snip>,db=<snip>
ERROR:  could not serialize access due to concurrent update

I don't believe these occurred too close to the slowdown.

- Brian F

On 10/27/2011 02:09 PM, Brian Fehrle wrote:
> On 10/27/2011 01:48 PM, Scott Marlowe wrote:
>> On Thu, Oct 27, 2011 at 12:39 PM, Brian Fehrle
>> <brianf@consistentstate.com>  wrote:
>>> Looking at top, I see no SWAP usage, very little IOWait, and there
>>> are a
>>> large number of postmaster processes at 100% cpu usage (makes sense,
>>> at this
>>> point there are 150 or so queries currently executing on the database).
>>>
>>>   Tasks: 713 total,  44 running, 668 sleeping,   0 stopped,   1 zombie
>>> Cpu(s):  4.4%us, 92.0%sy,  0.0%ni,  3.0%id,  0.0%wa,  0.0%hi,  0.3%si,
>>>   0.2%st
>>> Mem:  134217728k total, 131229972k used,  2987756k free,   462444k
>>> buffers
>>> Swap:  8388600k total,      296k used,  8388304k free, 119029580k
>>> cached
>> OK, a few points.  1: You've got a zombie process.  Find out what's
>> causing that, it could be a trigger of some type for this behaviour.
>> 2: You're 92% sys.  That's bad.  It means the OS is chewing up 92% of
>> your 32 cores doing something.  what tasks are at the top of the list
>> in top?
>>
> Out of the top 50 processes in top, 48 of them are postmasters, one is
> syslog, and one is psql. Each of the postmasters have a high %CPU, the
> top ones being 80% and higher, the rest being anywhere between 30% -
> 60%. Would postmaster 'queries' that are running attribute to the sys
> CPU usage, or should they be under the 'us' CPU usage?
>
>
>> Try running vmstat 10 for a a minute or so then look at cs and int
>> columns.  If cs or int is well over 100k there could be an issue with
>> thrashing, where your app is making some change to the db that
>> requires all backends to be awoken at once and the machine just falls
>> over under the load.
>
> We've restarted the postgresql cluster, so the issue is not happening
> at this moment. but running a vmstat 10 had my 'cs' average at 3K and
> 'in' averaging around 9.5K.
>
> - Brian F


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Server hitting 100% CPU usage, system comes to a crawl.
Next
From: Josh Berkus
Date:
Subject: PostgreSQL at LISA in Boston: Dec. 7-8