Re: BUG #6650: CPU system time utilization rising few times a day - Mailing list pgsql-bugs

From Andrzej Krawiec
Subject Re: BUG #6650: CPU system time utilization rising few times a day
Date
Msg-id CAAy64HjYYVG4_Qu2or278fD2QMQo53B7hjUUxE28FoFr2LyqDA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #6650: CPU system time utilization rising few times a day  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: BUG #6650: CPU system time utilization rising few times a day  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-bugs
Cannot strace or gdb on a production system under heavy load (about 100
transactions per second).
It's in kernel space not user, so we are unable to anything at this
particular moment (sometimes even the ssh connection seems to hang for a
while).
We suspect neither autovacuum (although suspected primarily) nor regular
backend. It is system time. The question is: what's the reasone for that?
We've dug through system and postgres logs, cleared out most of the long
query problems, idle in transaction, optimized queries, vacuumed, reindexed
and such.
For a while it seemed like the particular kernel version is causing
majority of problems. We have downgraded to 2.6.32.-71.29.1.el6.x86_64 and
those problems went mostly! away. For few days we had no situations, but it
happened again.

Regards
--
Andrzej Krawiec

2012/5/22 Robert Haas <robertmhaas@gmail.com>

> On Fri, May 18, 2012 at 5:09 AM,  <a.krawiec@focustelecom.pl> wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference:      6650
> > Logged by:          Andrzej Krawiec
> > Email address:      a.krawiec@focustelecom.pl
> > PostgreSQL version: 8.4.11
> > Operating system:   CentOS 6.0 - 2.6.32-220.13.1.el6.x86_64
> > Description:
> >
> > Primarily checked on PG 8.4.9 (same OS), problem also occurs. Few times a
> > day I get a situation where PostgreSQL stops running for 1-2 minutes.
> CPU is
> > running 99% in systime. IO is OK, only interrupts are extremely high
> (over
> > 100k). System operates on 2 x Xeon 10 Core, 128 GB RAM, raid 10. Does
> anyone
> > have any idea?
>
> Try using strace to figure out where all that system time is going.
> Sometimes the '-c' option is helpful.
>
> It might also be helpful to connect gdb to the process and get a
> backtrace, then continue, stop it again, get another backtrace.
> Repeat that a few times and send us the backtrace that occurs most
> frequently.
>
> Is it a regular backend that is eating all that CPU time, or an
> autovacuum worker?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

pgsql-bugs by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: PostgreSQL 9.2 beta1's pg_upgrade fails on Windows XP
Next
From: Andres Freund
Date:
Subject: Re: BUG #6661: out-of-order XID insertion in KnownAssignedXids