Hello everyone,
I'm seeking help in diagnosing / figuring out the issue that we have with
our DB server:
Under some (relatively non-heavy) load: 300...400 TPS, every 10-30 seconds
server drops into high cpu system usage (90%+ SYSTEM across all CPUs - it's
pure SYS cpu, i.e. it's not io wait, not irq, not user). Postgresql is
taking 10-15% at the same time. Those periods would last from few seconds,
to minutes or until Postgresql is restarted. Needless to say that system is
barely responsive, with load average hitting over 100. We have mostly select
statements (joins across few tables), using indexes and resulting in a small
number of records returned. Should number of requests per second coming drop
a bit, server does not fall into those HIGH-SYS-CPU periods. It all seems
like postgres runs out of some resources or fighting for some locks and that
causing kernel to go into la-la land trying to manage it.
So far we've checked:
- disk and nic delays / errors / utilization
- WAL files (created rarely)
- tables are vacuumed OK. periods of high SYS not tied to vacuum process.
- kernel resources utilization (sufficient FS handles, shared MEM/SEM, VM)
- increased log level, but nothing suspicious/different (to me) is reported
there during periods of high sys-cpu
- ran pgbench (could not reproduce the issue, even though it was producing
over 40,000 TPS for prolonged period of time)
Basically, our symptoms are exactly as was reported here over a year ago
(though for postgres 8.3, we ran 9.1):
http://archives.postgresql.org/pgsql-general/2011-10/msg00998.php
I will be grateful for any ideas helping to resolve or diagnose this
problem.
Environment background:
-----
--
View this message in context: http://postgresql.1045698.n5.nabble.com/High-SYS-CPU-need-advise-tp5734597.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.