Home > mailing lists

Server hitting 100% CPU usage, system comes to a crawl. - Mailing list pgsql-general

From	Brian Fehrle
Subject	Server hitting 100% CPU usage, system comes to a crawl.
Date	October 27, 2011 16:34:58
Msg-id	4EA9A562.7020808@consistentstate.com Whole thread Raw
Responses	Re: Server hitting 100% CPU usage, system comes to a crawl. Re: Server hitting 100% CPU usage, system comes to a crawl. Re: Server hitting 100% CPU usage, system comes to a crawl. Re: Server hitting 100% CPU usage, system comes to a crawl.
List	pgsql-general

Tree view

Hi all, need some help/clues on tracking down a performance issue.

PostgreSQL version: 8.3.11

I've got a system that has 32 cores and 128 gigs of ram. We have
connection pooling set up, with about 100 - 200 persistent connections
open to the database. Our applications then use these connections to
query the database constantly, but when a connection isn't currently
executing a query, it's <IDLE>. On average, at any given time, there are
3 - 6 connections that are actually executing a query, while the rest
are <IDLE>.

About once a day, queries that normally take just a few seconds slow way
down, and start to pile up, to the point where instead of just having
3-6 queries running at any given time, we get 100 - 200. The whole
system comes to a crawl, and looking at top, the CPU usage is 99%.

Looking at top, I see no SWAP usage, very little IOWait, and there are a
large number of postmaster processes at 100% cpu usage (makes sense, at
this point there are 150 or so queries currently executing on the database).

  Tasks: 713 total,  44 running, 668 sleeping,   0 stopped,   1 zombie
Cpu(s):  4.4%us, 92.0%sy,  0.0%ni,  3.0%id,  0.0%wa,  0.0%hi,  0.3%si,
0.2%st
Mem:  134217728k total, 131229972k used,  2987756k free,   462444k buffers
Swap:  8388600k total,      296k used,  8388304k free, 119029580k cached


In the past, we noticed that autovacuum was hitting some large tables at
the same time this happened, so we turned autovacuum off to see if that
was the issue, and it still happened without any vacuums running.

We also ruled out checkpoints being the cause.

I'm currently digging through some statistics I've been gathering to see
if traffic increased at all, or remained the same when the slowdown
occurred. I'm also digging through the logs from the postgresql cluster
(I increased verbosity yesterday), looking for any clues. Any
suggestions or clues on where to look for this to see what can be
causing a slowdown like this would be greatly appreciated.

Thanks,
     - Brian F

pgsql-general by date:

From: Martijn van Oosterhout
Date: 27 October 2011, 13:44:15
Subject: Re: PostGIS in a commercial project

From: John R Pierce
Date: 27 October 2011, 16:43:12
Subject: Re: Server hitting 100% CPU usage, system comes to a crawl.

Server hitting 100% CPU usage, system comes to a crawl. - Mailing list pgsql-general

Previous

Next