Just experienced a server that was spending over 50% of CPU time in the
system, apparently dealing with postmasters that were in the sblock
state. Looking at the FreeBSD source, this indicates that the process is
waiting for a lock on a socket. During this time the machine was doing
nearly 200k context switches a second.
At the same time, the server was also producing 'statistics buffer is
full' errors.
Has anyone seen this before? I suspect that the stats buffer errors are
a symptom and not the cause of the problem, but unfortunately I wasn't
able to get a stack trace to verify that theory.
The machine is a dual Opteron 250 with 8G of memory, running 8.1.3.
While this was going on there were between 10 and 250 backends running
at once, based on vmstat.
Any ideas what areas of the code could be locking a socket?
Theoretically it shouldn't be the stats collector, and the site is using
pgpool as a connection pool, so this shouldn't be due to trying to
connect to backends at a furious rate.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461