Re: Server hitting 100% CPU usage, system comes to a crawl. - Mailing list pgsql-general

From Brian Fehrle
Subject Re: Server hitting 100% CPU usage, system comes to a crawl.
Date
Msg-id 4EA9BA8F.4010605@consistentstate.com
Whole thread Raw
In response to Re: Server hitting 100% CPU usage, system comes to a crawl.  (Scott Marlowe <scott.marlowe@gmail.com>)
Responses Re: Server hitting 100% CPU usage, system comes to a crawl.  (Alan Hodgson <ahodgson@simkin.ca>)
Re: Server hitting 100% CPU usage, system comes to a crawl.  (David Kerr <dmk@mr-paradox.net>)
List pgsql-general
On 10/27/2011 01:48 PM, Scott Marlowe wrote:
> On Thu, Oct 27, 2011 at 12:39 PM, Brian Fehrle
> <brianf@consistentstate.com>  wrote:
>> Looking at top, I see no SWAP usage, very little IOWait, and there are a
>> large number of postmaster processes at 100% cpu usage (makes sense, at this
>> point there are 150 or so queries currently executing on the database).
>>
>>   Tasks: 713 total,  44 running, 668 sleeping,   0 stopped,   1 zombie
>> Cpu(s):  4.4%us, 92.0%sy,  0.0%ni,  3.0%id,  0.0%wa,  0.0%hi,  0.3%si,
>>   0.2%st
>> Mem:  134217728k total, 131229972k used,  2987756k free,   462444k buffers
>> Swap:  8388600k total,      296k used,  8388304k free, 119029580k cached
> OK, a few points.  1: You've got a zombie process.  Find out what's
> causing that, it could be a trigger of some type for this behaviour.
> 2: You're 92% sys.  That's bad.  It means the OS is chewing up 92% of
> your 32 cores doing something.  what tasks are at the top of the list
> in top?
>
Out of the top 50 processes in top, 48 of them are postmasters, one is
syslog, and one is psql. Each of the postmasters have a high %CPU, the
top ones being 80% and higher, the rest being anywhere between 30% -
60%. Would postmaster 'queries' that are running attribute to the sys
CPU usage, or should they be under the 'us' CPU usage?


> Try running vmstat 10 for a a minute or so then look at cs and int
> columns.  If cs or int is well over 100k there could be an issue with
> thrashing, where your app is making some change to the db that
> requires all backends to be awoken at once and the machine just falls
> over under the load.

We've restarted the postgresql cluster, so the issue is not happening at
this moment. but running a vmstat 10 had my 'cs' average at 3K and 'in'
averaging around 9.5K.

- Brian F

pgsql-general by date:

Previous
From: Josh Berkus
Date:
Subject: PostgreSQL at LISA in Boston: Dec. 7-8
Next
From: Tom Lane
Date:
Subject: Re: Custom data type in C with one fixed and one variable attribute