Re: max_connections reached in postgres 9.3.3 - Mailing list pgsql-general

From Merlin Moncure
Subject Re: max_connections reached in postgres 9.3.3
Date
Msg-id CAHyXU0zS=rFcLceWjjWEy7T061ibx4kkeg1SgiuicF+mwhMKqw@mail.gmail.com
Whole thread Raw
In response to Re: max_connections reached in postgres 9.3.3  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: max_connections reached in postgres 9.3.3
List pgsql-general
On Thu, Jun 19, 2014 at 2:35 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> "Vasudevan, Ramya" <ramya.vasudevan@classmates.com> wrote:
>
>> On the waiting queries - When  we reached 1500 connections, we
>> had 759 connections that were in active state (116 COMMIT, 238
>> INSERT, UPDATE 176, 57 AUTHENTICATION, 133 BIND. These active
>> INSERTS and UPDATES also includes the 80-90 waiting sessions (We
>> checked pg_stat_activity for 'waiting' state.  And pg_locks for
>> granted=f). The blocking and the waiting queries were simple one
>> row updates/inserts/deletes. These shouldn’t be blocking each
>> other normally (unless, we think, there was a problem writing to
>> the disk). Correct me if I am wrong.
>
> You may want to consider this:
>
> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections
>
>> [vmstat show up to 173111 context switches per second, with high
>> cs rate corresponding to user CPU% between 64 and 82]
>
> That usually means there is high contention for spinlocks,
> potentially with processes getting suspended while holding
> spinlocks, making things worse.

Completely agree on both diagnosis and proposed solution -- load
profile (low iowait, high user%, high processes, high CS) is
symptomatic of too many processes trying to do things at once.  So
there may be some application caused driver of the problem or you are
hitting a contention point within postgres (a 'perf top' might give
clues to the latter).  Either way, once you are in this state you end
up with too many processes fighting for cpu and cache lines which
exaggerates the problem to the point you can classify it as an outage.

Be advised transaction mode pooling makes certain features of the
database difficult or impossible to use -- advisory locks (except xact
variants), server side prepared statements, asynchronous
notificiation, WITH HOLD cursors and the like -- basically anything
scoped to the session.  For many workloads it is a high win though.
If for whatever reason this solution doesn't work, your other options
are to try to optimize whatever is causing the load event
(particularly if it's in your code -- careful query logging might give
some clues) or to simply upgrade hardware (more/faster cpu especially
for your case) to the point that even when highly loaded you're always
clearing queries at an acceptable rate.  The hardware approach has
some risk though -- if you have a contention problem it's not always a
given that adding cores will scale as well as you think.  Faster
core/bus is almost always a win, but obviously there's a very strict
limit you can go.

merlin


pgsql-general by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: max_connections reached in postgres 9.3.3
Next
From: Tom Lane
Date:
Subject: Re: \COPY from CSV ERROR: unterminated CSV quoted field