100% CPU pg processes that don't die. - Mailing list pgsql-general

From Scott Marlowe
Subject 100% CPU pg processes that don't die.
Date
Msg-id dcc563d10808091216q2ef509fcl92cd2bb49fff5fac@mail.gmail.com
Whole thread Raw
Responses Re: 100% CPU pg processes that don't die.  (Stephen Frost <sfrost@snowman.net>)
Re: 100% CPU pg processes that don't die.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
I'm load testing a machine, and i'm seeing idle in transaction
processes that are no longer hooked to any outside client, that pull
100% CPU and can't be kill -9ed.  I'm using pgbench -c 1000 -t 1000.
postgresql.conf attached.  This is on a 8 CPU AMD box with hardware
RAID.  I'll likely never see this many parallel connections in
production, but who knows...  I want to reboot the machine to test at
a lower number of threads but if there's more info in it to gain
knowledge from I'll leave it up.

They look like this in top:

 3552 postgres  20   0 8286m  82m  78m R  100  0.3 195:04.22 postgres:
postgres postgres [local] idle in transaction
 3561 postgres  20   0 8286m  83m  79m R  100  0.3 195:04.20 postgres:
postgres postgres [local] idle in transaction

This in ps aux|grep postgres:

postgres  3561 95.2  0.2 8485376 85708 ?       Rs   09:45 197:17
postgres: postgres postgres [local] idle in transaction

The db is still up and accessable.  I'm also getting this in my
/var/log/messages:

Aug  9 13:13:21 engelberg kernel: [71242.734934] CPU 1:
Aug  9 13:13:21 engelberg kernel: [71242.734935] Modules linked in:
iptable_filter ip_tables x_tables parport_pc lp parport loop ipv6
evdev i2c_nforce2 pcspkr shpchp button pci_hotplug i2c_core pata_amd
ata_generic ext3 jbd mbcache sg sr_mod cdrom sd_mod e1000 floppy
arcmsr pata_acpi libata ehci_hcd forcedeth ohci_hcd scsi_mod usbcore
thermal processor fan fbcon tileblit font bitblit softcursor fuse
Aug  9 13:13:21 engelberg kernel: [71242.734972] Pid: 294, comm:
kswapd0 Not tainted 2.6.24-19-server #1
Aug  9 13:13:21 engelberg kernel: [71242.734974] RIP:
0010:[floppy:_spin_lock_irqsave+0x12/0x30]
[floppy:_spin_lock_irqsave+0x12/0x30] _spin_lock_irqsave+0x12/0x30
Aug  9 13:13:21 engelberg kernel: [71242.734980] RSP:
0018:ffff810415423df8  EFLAGS: 00000286
Aug  9 13:13:21 engelberg kernel: [71242.734982] RAX: 0000000000000246
RBX: ffff81000003137d RCX: 0000000000000003
Aug  9 13:13:21 engelberg kernel: [71242.734984] RDX: 0000000000000001
RSI: ffff810415423ea0 RDI: ffff81000003137d
Aug  9 13:13:21 engelberg kernel: [71242.734987] RBP: ffff810415423d60
R08: 0000000000000000 R09: 0000000000000000
Aug  9 13:13:21 engelberg kernel: [71242.734989] R10: 0000000000000000
R11: ffffffff881a46b0 R12: ffff810415423d60
Aug  9 13:13:21 engelberg kernel: [71242.734991] R13: ffffffff8028d11e
R14: ffff81041f6b2670 R15: ffff810420168178
Aug  9 13:13:21 engelberg kernel: [71242.734994] FS:
00007f51096fd700(0000) GS:ffff8108171a2300(0000)
knlGS:0000000000000000
Aug  9 13:13:21 engelberg kernel: [71242.734997] CS:  0010 DS: 0018
ES: 0018 CR0: 000000008005003b
Aug  9 13:13:21 engelberg kernel: [71242.734999] CR2: 00007f4f27ebffd0
CR3: 0000000000201000 CR4: 00000000000006e0
Aug  9 13:13:21 engelberg kernel: [71242.735001] DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
Aug  9 13:13:21 engelberg kernel: [71242.735003] DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug  9 13:13:21 engelberg kernel: [71242.735006]
Aug  9 13:13:21 engelberg kernel: [71242.735006] Call Trace:
Aug  9 13:13:21 engelberg kernel: [71242.735009]
[usbcore:prepare_to_wait+0x23/0x80] prepare_to_wait+0x23/0x80
Aug  9 13:13:21 engelberg kernel: [71242.735013]  [kswapd+0xfa/0x560]
kswapd+0xfa/0x560
Aug  9 13:13:21 engelberg kernel: [71242.735020]  [<ffffffff80254260>]
autoremove_wake_function+0x0/0x30
Aug  9 13:13:21 engelberg kernel: [71242.735026]  [kswapd+0x0/0x560]
kswapd+0x0/0x560
Aug  9 13:13:21 engelberg kernel: [71242.735030]  [kthread+0x4b/0x80]
kthread+0x4b/0x80
Aug  9 13:13:21 engelberg kernel: [71242.735034]  [child_rip+0xa/0x12]
child_rip+0xa/0x12
Aug  9 13:13:21 engelberg kernel: [71242.735040]  [kthread+0x0/0x80]
kthread+0x0/0x80
Aug  9 13:13:21 engelberg kernel: [71242.735043]  [child_rip+0x0/0x12]
child_rip+0x0/0x12
Aug  9 13:13:21 engelberg kernel: [71242.735046]

Does this look like a kernel bug or a pgsql bug to most people?

pgsql-general by date:

Previous
From: Glyn Astill
Date:
Subject: Re: Initdb problem on debian mips cobalt: Bus error
Next
From: Stephen Frost
Date:
Subject: Re: 100% CPU pg processes that don't die.