Re: max_connections reached in postgres 9.3.3 - Mailing list pgsql-general

From Adrian Klaver
Subject Re: max_connections reached in postgres 9.3.3
Date
Msg-id 5398CD99.4010406@aklaver.com
Whole thread Raw
In response to max_connections reached in postgres 9.3.3  ("Vasudevan, Ramya" <ramya.vasudevan@classmates.com>)
List pgsql-general
On 06/11/2014 11:24 AM, Vasudevan, Ramya wrote:
> Our set up:
>
> ·Db version: postgres 9.3.3
>
> ·OS: CentOS 6.5
>
> ·kernel Version - Linux 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3
> 21:39:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>
> ·cpu - 24 proc
>
> ·memory - 768 GB
>
> ·The disks are SAN fiber.
>
> ·We have streaming replication set up from this server to the secondary
> server.
>
> ·Some of the DB parameters:
>
> max_connections  - 1500
>
> shared_buffers - 4GB
>
> work_mem = 130MB
>
>   maintenance_work_mem = 1GB
>
> Issue this morning:
>
> At 9:04 - FATAL:  remaining connection slots are reserved for
> non-replication superuser connections
>
> These were the things we noticed and have question on each one of them:
>
> pg_stat_activity showed a total of 850 connections but a 'ps -ef|grep
> postgers' showed 1500 connections and DB wasn’t accepting new
> connections. How is it possible to have such discrepancy in count
> between pg_stat_activity and ps?
>
> 200 were in waiting status, 50 were in authentication status, 748 were
> in IDLE status and the rest were all doing some DML operations. Why will
> 50 connections be in authentication status in the database and stay there?
>
>   Some existing sessions were completing the sqls successfully (though
> not as fast). So, the database was not completely hung or unresponsive.
>
> Trying to kill the session at the DB level (pg_terminate_backend) were
> just hanging. Why would this be the case?
>
> During this time, server had a very high load average of 130 (normally
> we have < 5).
>
> We had to restart the database to clear it all up.
>
>   Note:
>
> We are having this issue of max_connections being reached for a couple
> of months now. At one point, we tried increasing the max_connections to
> 2500 to see if we will still run out of connections, and we did.
>
> We noticed that generic-segmentation-offload was set to ON on this
> server. We changed it to OFF.
>
> Recent Changes in the environment:
>
> ·We turned off THP on the server a couple of months ago to see if that
> will fix the issue.
>
> ·We recently (2 weeks ago) moved from RedHat ext 2 to xfs because we
> were having the same problem.
>
> Please let us know if you need more information on our environment or on
> the issue this morning. Any help will be greatly appreciated.

So unless I am missing something the issue seems to be unclosed
connections. It would seem the thing to do would be to look at the
application(s) hitting the database and determine why the connections
are not being closed. Increasing max_connections is just driving deeper
into the wreck.

>
> Thank you,
>
> *________________*
>
> */Ramya Vasudevan/*
>
> /Database Administrator/
>
> */CLASSMATES/*
>


--
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: Igor Neyman
Date:
Subject: Re: what does pg_activity mean when the database is stuck?
Next
From: Si Chen
Date:
Subject: Re: what does pg_activity mean when the database is stuck?