"Anand Kumar, Karthik" <Karthik.AnandKumar@classmates.com> writes:
> We run postgres 9.1.11, on Centos 6.3, and an ext2 filesystem
> Everything will run along okay, and every few hours, for about a couple of minutes, postgres will slow way down. A
"select1" query takes between 10 and 15 seconds to run, and the box in general gets lethargic.
> This causes a pile up of connections at the DB, and we run out of max_connections.
> This is accompanied with a steep spike in system CPU and load avg. No spike in user CPU or in I/O.
System CPU only huh? There have been some reports of such behavior
apparently caused by inefficiencies in the kernel's support of
"transparent huge pages". See for instance this thread
http://www.postgresql.org/message-id/flat/CABMVzL2y8mRM5C9xxejAyDqe0i1S78RAE3cEATGYNf5Ktz_Zdg@mail.gmail.com
although it looks like in that case the real fix was to reduce the number
of backends.
> We do typically have a lot of idle connections (1500 connections total, over a 1000 idle at any given time). We're in
themidst of installing pgbouncer to try and mitigate the problem, but that still doesn't address the root cause.
1500 connections? What makes you think that itself isn't the root cause?
regards, tom lane