Thread:

From
Bruno Hass de Andrade
Date:

Hi. My company have servers that run postgres for storing some logs and serving django web interfaces for management of the server itself. In the last days some servers stopped serving the web interface, and syslog show this error:

Jun 23 04:40:19 django-1 postgres[8790]: [3-1] FATAL:  remaining connection slots are reserved for non-replication superuser connections


So I started looking for the cause:


$ ps auxf
...
postgres  4580  0.7  0.1 305156 19184 ?        S    Jun01 251:46 /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data
postgres  4746  0.0  1.6 305608 274656 ?       Ss   Jun01   4:09  \_ postgres: checkpointer process                          
postgres  4747  0.0  1.2 305288 208708 ?       Ss   Jun01   0:33  \_ postgres: writer process                                
postgres  4748  0.0  0.0 305288  9300 ?        Ss   Jun01   0:04  \_ postgres: wal writer process                            
postgres  4750  0.0  0.0 305972  2376 ?        Ss   Jun01   0:00  \_ postgres: autovacuum launcher process                   
postgres  4752  0.1  0.0  23852  1396 ?        Ss   Jun01  50:54  \_ postgres: stats collector process                       
postgres 63615  0.0  0.1 307036 22208 ?        Ss   Jun02   0:00  \_ postgres: db db [local] idle                      
postgres 22476  0.0  0.0 306368  2576 ?        Ss   Jun22   0:00  \_ postgres: db db [local] idle                      
postgres  4521  0.0  0.0 306512  6408 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4522  0.0  0.0 306512  6412 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4523  0.0  0.0 306512  6416 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4524  0.0  0.0 306512  6412 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4534  0.0  0.0 306512  6152 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4544  0.0  0.0 306512  6420 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4552  0.0  0.0 306512  6408 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4742  0.0  0.0 306512  6400 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4743  0.0  0.0 306512  6400 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4766  0.0  0.0 306512  6408 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4770  0.0  0.0 306512  6144 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4774  0.0  0.0 306512  6396 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4783  0.0  0.0 306512  6400 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4786  0.0  0.0 306512  6376 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4804  0.0  0.0 306512  6376 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4812  0.0  0.0 306512  6376 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4860  0.0  0.0 306512  6356 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4862  0.0  0.0 306516  6672 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4868  0.0  0.0 306516  6408 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4878  0.0  0.0 306516  6684 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4881  0.0  0.0 306516  6164 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4882  0.0  0.0 306516  6168 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4886  0.0  0.0 306500  6524 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4887  0.0  0.0 306500  6272 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4889  0.0  0.0 306500  6272 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4890  0.0  0.0 306500  6276 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  4907  0.0  0.0 306500  6796 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  5131  0.0  0.0 306500  6268 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  5138  0.0  0.0 306512  6116 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  5142  0.0  0.0 306512  6116 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  5143  0.0  0.0 306512  6644 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  5151  0.0  0.0 306512  6120 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  5154  0.0  0.0 306512  6904 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  5155  0.0  0.0 306512  6128 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  5156  0.0  0.0 306512  6120 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  5157  0.0  0.0 306512  6380 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  5162  0.0  0.0 306512  6120 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  5165  0.0  0.0 306512  6384 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  5172  0.0  0.0 306512  6128 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  5174  0.0  0.0 306512  6120 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  5188  0.0  0.0 306512  6124 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  5189  0.0  0.0 306512  6636 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres  5190  0.0  0.0 306512  6104 ?        Ss   00:13   0:00  \_ postgres: db db [local] idle                      
postgres 26621  0.0  0.0 306512  6116 ?        Ss   12:54   0:00  \_ postgres: db db [local] idle                      
postgres  9600  0.0  0.0 306512  6120 ?        Ss   13:30   0:00  \_ postgres: db db [local] idle 


50 connections in idle status, but looking further:

$ cat /proc/4521/stack
[<ffffffff81b59a8a>] unix_stream_recvmsg+0x2b9/0x633
[<ffffffff81acbf27>] __sock_recvmsg_nosec+0x29/0x2b
[<ffffffff81ace30c>] sock_recvmsg+0x65/0x88
[<ffffffff81ace409>] SYSC_recvfrom+0xda/0x134
[<ffffffff81acf4ef>] SyS_recvfrom+0x9/0xb
[<ffffffff81bd4062>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

All connections have stucked in this stack.


The postmaster process:

$ cat /proc/4580/stack
[<ffffffff810ec9fc>] poll_schedule_timeout+0x3e/0x61
[<ffffffff810ed32f>] do_select+0x5ea/0x629
[<ffffffff810ed4f7>] core_sys_select+0x189/0x245
[<ffffffff810ed632>] SyS_select+0x7f/0xb5
[<ffffffff81bd4062>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff



Some information about the server:


$ df -lh
Filesystem        Size  Used Avail Use% Mounted on
/dev/mapper/root  437G   13G  403G   3% /
/dev/sda1         2.0G  204M  1.6G  12% /boot
tmpfs             7.9G  224K  7.9G   1% /run
devtmpfs          7.8G     0  7.8G   0% /dev
shm               7.9G     0  7.9G   0% /dev/shm


$ free -m
             total       used       free     shared    buffers     cached
Mem:         16030      13221       2808          0         10       3486
-/+ buffers/cache:       9725       6304
Swap:            0          0          0


$ cat /proc/cpuinfo
...
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 58
model name : Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz
stepping : 9
microcode : 0x15
cpu MHz : 3092.951
cache size : 8192 KB
...


# postgresql.conf #

external_pid_file = '/tmp/postgres.pid'
listen_addresses = ''
port = 5432
max_connections = 50
unix_socket_directories = '/tmp'
unix_socket_permissions = 0777
bonjour = off
authentication_timeout = 10min
ssl = on
ssl_ciphers = 'DEFAULT:!LOW:!EXP:!MD5:@STRENGTH'
ssl_renegotiation_limit = 512MB
ssl_cert_file = 'server.crt'
ssl_key_file = 'server.key'

shared_buffers = 256MB
temp_buffers = 128MB
work_mem = 256MB
maintenance_work_mem = 256MB
max_stack_depth = 1MB
max_files_per_process = 1000
max_locks_per_transaction = 128
effective_cache_size = 512MB

wal_level = minimal
wal_writer_delay = 500ms
checkpoint_segments = 100
enable_indexscan = on
enable_sort = on

log_destination = 'syslog'
syslog_facility = 'LOCAL0'
syslog_ident = 'postgres'
client_min_messages = error
log_min_messages = error
log_min_error_statement = fatal
log_min_duration_statement = -1
log_timezone = 'Brazil/East'
track_activities = on
track_counts = on
track_io_timing = off

autovacuum = on
autovacuum_max_workers = 1
autovacuum_naptime = 300min
autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50

search_path = 'public'

datestyle = 'iso, mdy'
intervalstyle = 'postgres'
timezone = 'Brazil/East'

lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'

default_text_search_config = 'pg_catalog.english'
restart_after_crash = on

# end conf #







I really don't know what is happening, why postgres hang and didn't close the connections. This django web interface is used only for management and viewing logs, most server have two users only.

I've sent this email just to know if one of yours have seen this, or something like this, before.


Thank you!




--
-----

Bruno Hass
(51) 9280-3627


Re:

From
John R Pierce
Date:
On 6/23/2015 10:58 AM, Bruno Hass de Andrade wrote:
I really don't know what is happening, why postgres hang and didn't close the connections. This django web interface is used only for management and viewing logs, most server have two users only.

I've sent this email just to know if one of yours have seen this, or something like this, before.

use lsof or equivalent to determine what process owns the other side of the socket that one of those processes is listening on.   my guess is, you're not releasing resources in django and they are piling up.



-- 
john r pierce, recycling bits in santa cruz

Re:

From
Albe Laurenz
Date:
Bruno Hass de Andrade wrote:
> Hi. My company have servers that run postgres for storing some logs and serving django web interfaces
> for management of the server itself. In the last days some servers stopped serving the web interface,
> and syslog show this error:
> 
>     Jun 23 04:40:19 django-1 postgres[8790]: [3-1] FATAL:  remaining connection slots are reserved
> for non-replication superuser connections
> 
> So I started looking for the cause:

[...]

> 50 connections in idle status, but looking further:
> 
> 
>     $ cat /proc/4521/stack
>     [<ffffffff81b59a8a>] unix_stream_recvmsg+0x2b9/0x633
>     [<ffffffff81acbf27>] __sock_recvmsg_nosec+0x29/0x2b
>     [<ffffffff81ace30c>] sock_recvmsg+0x65/0x88
>     [<ffffffff81ace409>] SYSC_recvfrom+0xda/0x134
>     [<ffffffff81acf4ef>] SyS_recvfrom+0x9/0xb
>     [<ffffffff81bd4062>] system_call_fastpath+0x16/0x1b
>     [<ffffffffffffffff>] 0xffffffffffffffff
> 
> 
> All connections have stucked in this stack.

That just means that they are idle an waiting to receive a message from the client.

[...]

>     # postgresql.conf #
[...]
>     max_connections = 50

[...]

> I really don't know what is happening, why postgres hang and didn't close the connections. This django
> web interface is used only for management and viewing logs, most server have two users only.

The problem is clear from the PostgreSQL side:
You have reached the connection limit an so receive errors when you start new connections.

You'll have to investigate your application server why it starts so many connections
or why it fails to close idle connections after some time.
Is there a connection pool? How is it configured?

Yours,
Laurenz Albe