Re: Server hitting 100% CPU usage, system comes to a crawl. - Mailing list pgsql-general

From Brian Fehrle
Subject Re: Server hitting 100% CPU usage, system comes to a crawl.
Date
Msg-id 4EA9CB99.8090808@consistentstate.com
Whole thread Raw
In response to Server hitting 100% CPU usage, system comes to a crawl.  (Brian Fehrle <brianf@consistentstate.com>)
Responses Re: Server hitting 100% CPU usage, system comes to a crawl.  (Brian Fehrle <brianf@consistentstate.com>)
List pgsql-general
On 10/27/2011 02:50 PM, Tom Lane wrote:
> Brian Fehrle<brianf@consistentstate.com>  writes:
>> Hi all, need some help/clues on tracking down a performance issue.
>> PostgreSQL version: 8.3.11
>> I've got a system that has 32 cores and 128 gigs of ram. We have
>> connection pooling set up, with about 100 - 200 persistent connections
>> open to the database. Our applications then use these connections to
>> query the database constantly, but when a connection isn't currently
>> executing a query, it's<IDLE>. On average, at any given time, there are
>> 3 - 6 connections that are actually executing a query, while the rest
>> are<IDLE>.
>> About once a day, queries that normally take just a few seconds slow way
>> down, and start to pile up, to the point where instead of just having
>> 3-6 queries running at any given time, we get 100 - 200. The whole
>> system comes to a crawl, and looking at top, the CPU usage is 99%.
> This is jumping to a conclusion based on insufficient data, but what you
> describe sounds a bit like the sinval queue contention problems that we
> fixed in 8.4.  Some prior reports of that:
> http://archives.postgresql.org/pgsql-performance/2008-01/msg00001.php
> http://archives.postgresql.org/pgsql-performance/2010-06/msg00452.php
>
> If your symptoms match those, the best fix would be to update to 8.4.x
> or later, but a stopgap solution would be to cut down on the number of
> idle backends.
>
>             regards, tom lane
That sounds somewhat close to the same issue I am seeing. Main
differences being that my spike lasts for much longer than a few
minutes, and can only be resolved when the cluster is restarted. Also,
that second link shows TOP where much of the CPU is via the 'user',
rather than the 'sys' like mine.

Is there anything I can look at more to get more info on this 'sinval
que contention problem'?

Also, having my cpu usage high in 'sys' rather than 'us', could that be
a red flag? Or is that normal?

- Brian F

pgsql-general by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: Server hitting 100% CPU usage, system comes to a crawl.
Next
From: Scott Mead
Date:
Subject: Re: Server hitting 100% CPU usage, system comes to a crawl.