[GENERAL] BDR, wal sender, high system cpu, mutex_lock_common - Mailing list pgsql-general

From milist ujang
Subject [GENERAL] BDR, wal sender, high system cpu, mutex_lock_common
Date
Msg-id CACG9ogw8Km5LFSVSQhgWOj0xcyHeip4bvp9+FXGWOtkS7mwQHQ@mail.gmail.com
Whole thread Raw
Responses Re: [GENERAL] BDR, wal sender, high system cpu, mutex_lock_common  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-general
Hi all,

I've an environment 9.4 + bdr: 
PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit

kernel version:
3.2.0-4-amd64 #1 SMP Debian 3.2.65-1 x86_64 GNU/Linux

This is consolidation databases, in this machine there are around 250+ wal sender processes.

top output revealed high system cpu:
%Cpu(s):  1.4 us, 49.7 sy,  0.0 ni, 48.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

profiling cpu with perf:

perf top -e cpu-clock

Events: 142K cpu-clock
 82.37%  [kernel]            [k] __mutex_lock_common.isra.5
  4.49%  [kernel]            [k] do_raw_spin_lock
  2.23%  [kernel]            [k] mutex_lock
  2.16%  [kernel]            [k] mutex_unlock
  2.12%  [kernel]            [k] arch_local_irq_restore
  1.73%  postgres            [.] ValidXLogRecord
  0.87%  [kernel]            [k] __mutex_unlock_slowpath
  0.78%  [kernel]            [k] arch_local_irq_enable
  0.63%  [kernel]            [k] sys_recvfrom


finally get which processes (wal senders) that are using mutexes:

perf top -e task-clock -p 55382

Events: 697  task-clock
 88.08%  [kernel]  [k] __mutex_lock_common.isra.5
  3.27%  [kernel]  [k] do_raw_spin_lock
  2.34%  [kernel]  [k] arch_local_irq_restore
  2.10%  postgres  [.] ValidXLogRecord
  1.87%  [kernel]  [k] mutex_unlock
  1.87%  [kernel]  [k] mutex_lock
  0.47%  [kernel]  [k] sys_recvfrom

strace output of wal sender process:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.30    1.030072           5    213063    201463 read
  1.69    0.017686           0    201464    201464 recvfrom
  0.01    0.000110           0       806           lseek
  0.00    0.000043           0       474       468 rt_sigreturn
  0.00    0.000000           0         6           open
  0.00    0.000000           0         6           close
------ ----------- ----------- --------- --------- ----------------
100.00    1.047911                415819    403395 total


strace detail, majority read from pipe and recvfrom from socket (but most of them are EAGAIN):

read(15, "~\320\5\0\1\0\0\0\0@\235\1\360\16\0\0\334\26\0\0\0\0\0\0\365\27\0\0\0\0\0\0"..., 8192) = 8192 <0.000025>
read(6, 0x7fffdd837b3f, 1) = -1 EAGAIN (Resource temporarily unavailable) <0.000116>
recvfrom(10, 0x7fffdd837b17, 1, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000049>

ls -l /proc/62388/fd/15
lr-x------ 1 postgres postgres 64 Oct  1 08:39 /proc/62388/fd/15 -> /data/pg_xlog/0000000100000EF000000061

ls -l /proc/62388/fd/6
lr-x------ 1 postgres postgres 64 Oct  1 08:39 /proc/62388/fd/6 -> pipe:[1090892506]

ls -l /proc/62388/fd/10
lrwx------ 1 postgres postgres 64 Oct  1 08:39 /proc/62388/fd/10 -> socket:[1096584060]

I wonder, is there kernel version has better handling mutexes? 
Or is it the expected behavior?

Sorry for cross-posting, I have posted the same on pgsql-performance too....


--

pgsql-general by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: [GENERAL] Checkpoint write time - anything unusual?
Next
From: Craig Ringer
Date:
Subject: Re: [GENERAL] BDR, wal sender, high system cpu, mutex_lock_common