100% cpu usage on some postmaster processes kill the complete database - Mailing list pgsql-general

From Paul Dunkler
Subject 100% cpu usage on some postmaster processes kill the complete database
Date
Msg-id 3887D5AB-5997-47C6-AB27-367FAEA90BAA@xyrality.com
Whole thread Raw
Responses Re: 100% cpu usage on some postmaster processes kill the complete database
List pgsql-general
Hi List,

we are currently running a rather large postgresql-installation with approximately 4k Transactions and 50k index scans per second.

In the last days on some times of the day (irregular - 3-4 times a day), some of the postmaster processes are running with 100% cpu usage. That leads to a totally breakdown of the query execution. We see tons of statements which are correctly automatically aborted by our statement_timeout set to 15 seconds. I tried to search, but do not really recognize what the problem could be there...

Some things i have checked:
- We are not running any bulk jobs or maintenance scripts at this time
- No system errors in any logs during that slowdowns
- I/O Performance seems fine. No high IO Wait amount... But IO Write totally drops in that times because it seems that no postgres process can perform any update

I just installed a script, which prints me out the top and ps axf information for facing out the problem. I will post a snippet of the top here:

top - 15:55:02 up 59 days, 37 min,  1 user,  load average: 35.95, 14.04, 7.32
Tasks: 2417 total,  54 running, 2363 sleeping,   0 stopped,   0 zombie
Cpu(s):  6.3%us,  1.0%sy,  0.0%ni, 90.2%id,  1.9%wa,  0.0%hi,  0.6%si,  0.0%st
Mem:  264523700k total, 250145228k used, 14378472k free,   207032k buffers
Swap:  2097144k total,   553624k used,  1543520k free, 166905748k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
29852 postgres  20   0  131g  59m  35m R 100.0  0.0   1:27.71 postmaster        
29854 postgres  20   0  131g  70m  45m R 100.0  0.0   1:35.43 postmaster        
17449 postgres  20   0  131g 1.2g 1.2g R 100.0  0.5   1:52.62 postmaster        
29868 postgres  20   0  131g 1.1g 1.0g R 100.0  0.4   1:58.93 postmaster        
30136 postgres  20   0  131g  77m  52m R 100.0  0.0   1:34.33 postmaster        
30294 postgres  20   0  131g  66m  41m R 100.0  0.0   1:33.33 postmaster        
30864 postgres  20   0  131g  66m  41m R 100.0  0.0   1:36.17 postmaster        
30872 postgres  20   0  131g  61m  36m R 100.0  0.0   1:26.81 postmaster        
30876 postgres  20   0  131g  68m  43m R 100.0  0.0   1:33.97 postmaster        
30899 postgres  20   0  131g  68m  44m R 100.0  0.0   1:38.95 postmaster        
30906 postgres  20   0  131g  67m  42m R 100.0  0.0   1:27.82 postmaster        
31173 postgres  20   0  131g  68m  44m R 100.0  0.0   1:28.49 postmaster        
31239 postgres  20   0  131g  71m  46m R 100.0  0.0   1:31.42 postmaster        
31248 postgres  20   0  131g  90m  65m R 100.0  0.0   1:26.20 postmaster        
34934 postgres  20   0  131g 5580 3456 R 100.0  0.0   1:23.96 postmaster        
47945 postgres  20   0  131g 3.0g 3.0g R 100.0  1.2   6:08.41 postmaster        
16116 postgres  20   0  131g  84m  59m R 100.0  0.0   1:30.60 postmaster        
16304 postgres  20   0  131g  85m  60m R 100.0  0.0   1:38.89 postmaster        
17104 postgres  20   0  131g  96m  72m R 100.0  0.0   1:27.54 postmaster        
17111 postgres  20   0  131g  98m  73m R 100.0  0.0   1:38.23 postmaster        
17320 postgres  20   0  131g  98m  74m R 100.0  0.0   1:38.51 postmaster        
31221 postgres  20   0  131g  63m  38m R 100.0  0.0   1:33.63 postmaster        
31272 postgres  20   0  131g 1.0g 1.0g R 100.0  0.4   1:32.71 postmaster        
 3290 postgres  20   0  131g  99m  74m R 100.0  0.0   1:32.76 postmaster        
 3459 postgres  20   0  131g 2.1g 2.0g R 100.0  0.8   1:44.92 postmaster        
16492 postgres  20   0  131g 100m  75m R 100.0  0.0   1:33.36 postmaster        
16562 postgres  20   0  131g 114m  89m R 100.0  0.0   1:35.14 postmaster        
17146 postgres  20   0  131g  91m  66m R 100.0  0.0   1:37.39 postmaster        
17403 postgres  20   0  131g  98m  73m R 100.0  0.0   1:32.13 postmaster        
31100 postgres  20   0  131g  62m  38m R 100.0  0.0   1:29.06 postmaster        
 2019 postgres  20   0  131g 1.2g 1.2g R 98.7  0.5   1:40.91 postmaster         
 2150 postgres  20   0  131g 1.3g 1.3g R 98.7  0.5   2:53.14 postmaster         
16048 postgres  20   0  131g  71m  46m R 98.7  0.0   1:29.75 postmaster         
30190 postgres  20   0  131g 1.4g 1.3g R 98.7  0.5   0:55.98 postmaster         
16112 postgres  20   0  131g 862m 827m R 97.1  0.3   0:48.00 postmaster         
31202 postgres  20   0  131g  74m  49m R 97.1  0.0   1:34.62 postmaster         
35658 postgres  20   0  131g 5948 3788 R 97.1  0.0   0:12.29 postmaster         
16134 postgres  20   0  131g 1.9g 1.9g R 95.4  0.8   1:47.27 postmaster         
31034 postgres  20   0  131g  69m  44m R 95.4  0.0   1:26.35 postmaster         
16120 postgres  20   0  131g 1.2g 1.2g R 93.8  0.5   2:04.02 postmaster         
30891 postgres  20   0  131g  57m  33m R 93.8  0.0   1:23.08 postmaster         
31261 postgres  20   0  131g  81m  56m R 93.8  0.0   1:24.51 postmaster         
29790 postgres  20   0  131g  62m  37m R 92.2  0.0   1:35.34 postmaster         
30426 postgres  20   0  131g  62m  37m R 87.4  0.0   1:34.51 postmaster         
30857 postgres  20   0  131g  50m  26m R 79.3  0.0   1:37.82 postmaster         
  507 root      39  19     0    0    0 R 67.9  0.0  19:19.71 khugepaged         
16095 postgres  20   0  131g  83m  58m R 67.9  0.0   1:27.64 postmaster         
30856 postgres  20   0  131g  69m  44m R 67.9  0.0   1:34.46 postmaster         
17442 postgres  20   0  131g 2.4g 2.4g S 11.3  0.9   1:02.14 postmaster         


Postgresql Version information:
- PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.5 20110214 (Red Hat 4.4.5-6), 64-bit
- Running Hot Replication to another node (same hardware setup there)

Server Hardware:
- 4x 12 Core AMD Magny cours
- 256 GB of RAM (36% currently used)
- 1,3 TB SAS Raid (LSI Raid controller) - 15k rpm

If i lost to include some important informations for you analyzing my problem, let me please know. I did my best to post the question as accurate as possible for me.

--
Mit freundlichen Grüßen

Paul Dunkler

pgsql-general by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: what Linux to run
Next
From: Richard Huxton
Date:
Subject: Re: 100% cpu usage on some postmaster processes kill the complete database