OOM Killing on Docker while ANALYZE running - Mailing list pgsql-admin

From Jorge Daniel
Subject OOM Killing on Docker while ANALYZE running
Date
Msg-id CP2P152MB03563AA07394B591FD640158BCE10@CP2P152MB0356.LAMP152.PROD.OUTLOOK.COM
Whole thread Raw
Responses Re: OOM Killing on Docker while ANALYZE running
List pgsql-admin

Hi guys , I'm dealing with OOM killing on Postgresql 9.4.8 running on docker , 



Docker :  Docker version 17.05.0,

"resources_template = Resources(cpu = 1,ram = 1 * GB, disk = 1 * GB)"


Using Ceph volumes as storages



------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 5079992
max total shared memory (kbytes) = 5079992
min seg size (bytes) = 1

------ Semaphore Limits --------
max number of arrays = 32000
max semaphores per array = 32000
max semaphores system wide = 1024000000
max ops per semop call = 500
semaphore max value = 32767

------ Messages Limits --------
max queues system wide = 32000
max size of message (bytes) = 8192
default max size of queue (bytes) = 16384


------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x0052e2c1 0          postgres   600        56         5
0x6b8b4567 32769      postgres   600        2028       5

postgres@d90b66757bb1:~$ ipcs -a

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x0052e2c1 0          postgres   600        56         5
0x6b8b4567 32769      postgres   600        2028       5

------ Semaphore Arrays --------
key        semid      owner      perms      nsems
0x0052e2c1 0          postgres   600        17
0x0052e2c2 32769      postgres   600        17
0x0052e2c3 65538      postgres   600        17
0x0052e2c4 98307      postgres   600        17
0x0052e2c5 131076     postgres   600        17
0x0052e2c6 163845     postgres   600        17
0x0052e2c7 196614     postgres   600        17

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages


The postgres installation is : 
 
PostgreSQL 9.4.8 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit


From postgresql.conf :

shared_buffers = 500MB                    # min 128kB
temp_buffers = 80MB                     # min 800kB
work_mem = 80MB                         # min 64kB
maintenance_work_mem = 80MB              # min 1MB
autovacuum_work_mem = 300MB             # min 1MB, or -1 to use maintenance_work_mem
effective_cache_size = 500MB





While running 

 /usr/bin/pgbench --initialize --fillfactor=100 --no-vacuum --scale=1000 pgbench
.........
100000000 of 100000000 tuples (100%) done (elapsed 189.29 s, remaining 0.00 s).
set primary keys...
done.

2018-01-25 10:19:19.652 PST rhost=[local] app=psql:user=postgres:db=postgres:LOG:  disconnection: session time: 0:00:14.824 user=postgres database=postgres host=[local]
2018-01-25 10:19:33.371 PST rhost=[local] app=[unknown]:user=[unknown]:db=[unknown]:LOG:  connection received: host=[local]
2018-01-25 10:19:33.371 PST rhost=[local] app=[unknown]:user=postgres:db=pgbench:LOG:  connection authorized: user=postgres database=pgbench
2018-01-25 10:24:55.640 PST rhost=[local] app=pgbench:user=postgres:db=pgbench:LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp812.0", size 1073741824
2018-01-25 10:24:55.640 PST rhost=[local] app=pgbench:user=postgres:db=pgbench:STATEMENT:  alter table pgbench_accounts add primary key (aid)
2018-01-25 10:24:55.677 PST rhost=[local] app=pgbench:user=postgres:db=pgbench:LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp812.1", size 928219136
2018-01-25 10:24:55.677 PST rhost=[local] app=pgbench:user=postgres:db=pgbench:STATEMENT:  alter table pgbench_accounts add primary key (aid)
2018-01-25 10:24:55.752 PST rhost=[local] app=pgbench:user=postgres:db=pgbench:LOG:  disconnection: session time: 0:05:22.381 user=postgres database=pgbench host=[local]
2018-01-25 10:26:29.937 PST rhost= app=:user=:db=:LOG:  server process (PID 2709) was terminated by signal 9: Killed
2018-01-25 10:26:29.937 PST rhost= app=:user=:db=:DETAIL:  Failed process was running: autovacuum: ANALYZE public.pgbench_accounts
2018-01-25 10:26:29.937 PST rhost= app=:user=:db=:LOG:  terminating any other active server processes
2018-01-25 10:26:29.937 PST rhost= app=:user=:db=:WARNING:  terminating connection because of crash of another server process
2018-01-25 10:26:29.937 PST rhost= app=:user=:db=:DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-01-25 10:26:29.937 PST rhost= app=:user=:db=:HINT:  In a moment you should be able to reconnect to the database and repeat your command.


pgbench=# select count(*) from pgbench_accounts ;
   count
-----------
 100000000
(1 row)

pgbench=# SELECT pg_size_pretty(pg_total_relation_size('pgbench_accounts'));
 pg_size_pretty
----------------
 15 GB
(1 row)


The kern.log shows: 

Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818867] watch invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=0-1, order=0, oom_score_adj=0
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818868] watch cpuset=d90b66757bb18658502690ae68d4ea365f922ed40855c15b5eb5d074be38c223 mems_allowed=0-1
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818873] CPU: 29 PID: 1911120 Comm: watch Tainted: G           OE   4.10.0-42-generic #46~16.04.1-Ubuntu
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818874] Hardware name: Supermicro SYS-F628R3-RTB+-SS30C/X10DRFR, BIOS 2.0c 04/13/2017
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818875] Call Trace:
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818881]  dump_stack+0x63/0x90
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818886]  dump_header+0x7b/0x1fd
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818891]  ? mem_cgroup_scan_tasks+0xc4/0xf0
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818895]  oom_kill_process+0x219/0x3e0
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818897]  out_of_memory+0x120/0x4b0
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818898]  mem_cgroup_out_of_memory+0x4b/0x80
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818900]  mem_cgroup_oom_synchronize+0x325/0x340
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818901]  ? memory_high_write+0xe0/0xe0
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818902]  pagefault_out_of_memory+0x36/0x80
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818907]  mm_fault_error+0x8f/0x190
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818909]  __do_page_fault+0x4b2/0x4e0
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818910]  do_page_fault+0x22/0x30
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818914]  page_fault+0x28/0x30
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818916] RIP: 0033:0x7fedbf439830
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818916] RSP: 002b:00007ffeb543c288 EFLAGS: 00010202
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818917] RAX: 00007fedbf7c1de0 RBX: 00007fedbf7c1de0 RCX: 00007fedbf4bef20
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818918] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 00007fedbf57c209
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818918] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818919] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000001
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818919] R13: ffffffffffffffff R14: 00000000ffffffff R15: 00000000ffffffff
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818920] Task in /docker/d90b66757bb18658502690ae68d4ea365f922ed40855c15b5eb5d074be38c223 killed as a result of limit of /docker/d90b6675
7bb18658502690ae68d4ea365f922ed40855c15b5eb5d074be38c223
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818924] memory: usage 1048576kB, limit 1048576kB, failcnt 7864663
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818924] memory+swap: usage 1048576kB, limit 2097152kB, failcnt 0
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818925] kmem: usage 42116kB, limit 9007199254740988kB, failcnt 0
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818925] Memory cgroup stats for /docker/d90b66757bb18658502690ae68d4ea365f922ed40855c15b5eb5d074be38c223: cache:2344KB rss:1004116KB rss
_huge:0KB mapped_file:256KB dirty:4KB writeback:0KB swap:0KB inactive_anon:518092KB active_anon:486012KB inactive_file:1436KB active_file:732KB unevictable:0KB
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.818940] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.819358] [1877175]     0 1877175     4899       72      13       3        0             0 docker-entrypoi
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.819361] [1877277]     0 1877277    12050       93      27       3        0             0 su
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.819362] [1877279]    26 1877279   164354      335      51       4        0             0 postgres
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.819363] [1877284]    26 1877284   164598      473      51       4        0             0 postgres
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.819364] [1877285]    26 1877285   164417      335      49       4        0             0 postgres
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.819366] [1877286]    26 1877286   164417      335      49       4        0             0 postgres
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.819367] [1877287]    26 1877287   164517      388      50       4        0             0 postgres
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.819368] [1877288]    26 1877288    26624      335      49       4        0             0 postgres
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.819369] [1877289]    26 1877289    26659      351      50       4        0             0 postgres
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.819385] [1878251]     0 1878251     4947      126      13       3        0             0 bash
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.819388] [1911120]     0 1911120     3519      459      13       3        0             0 watch
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.819404] [1943707]     0 1943707     4946      125      15       3        0             0 bash
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.819411] [1946711]     0 1946711    12050       96      29       3        0             0 su
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.819412] [1946712]    26 1946712     4545      122      14       3        0             0 bash
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.819435] [2015235]    26 2015235   422880   249393     540       5        0             0 postgres
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.819443] Memory cgroup out of memory: Kill process 2015235 (postgres) score 953 or sacrifice child
Jan 25 10:26:29 lon-r10-u17 kernel: [1737264.828834] Killed process 2015235 (postgres) total-vm:1691520kB, anon-rss:997572kB, file-rss:0kB, shmem-rss:0kB

Again I reply the escenario but now  monitoring the used memory inside the container, and it really shows a bump when I shoot the ANALYZE :

pgbench=# analyze verbose pgbench_accounts;
INFO:  analyzing "public.pgbench_accounts"
INFO:  "pgbench_accounts": scanned 1639345 of 1639345 pages, containing 100000000 live rows and 0 dead rows; 3000000 rows in sample, 100000000 estimated total rows
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Meantime 
$ while true ; do cat /sys/fs/cgroup/memory/memory.usage_in_b  sleep 1; done

82620416
82821120
82780160
83017728
82903040
82993152
83054592
82894848
82964480
83017728
675348480  ------>>ANALYZE 
1073606656
1073700864
1073684480
1073479680
1073704960
1073598464
1073713152
1073590272
1073733632
1073610752
1073467392
1073729536
1073717248
1073582080
1073594368
1073598464
1073741824
1073700864
1073623040
1073700864
1073610752
1073487872
1073496064
1073565696
1073717248
1073238016
1073717248
1073717248
1073647616
1073676288
1073524736
1073692672
1073717248
1073631232
1073696768
1073496064
1073627136
1073627136
1073635328
1073561600
1073545216
710078464 ------> KILL
85487616
85274624
85237760

It's clear that the whole container memory was in fact filled-up with the something happening in analyze, but then if the work_mem is set, why it's breaking outside its memory limits? Has anyone saw  this before? 

The actual case is happening in my Prod clusters , with large tables. 
I replicated the issue  with pgbench to have  a simpler escenario . 

Any thoguhts?



pgsql-admin by date:

Previous
From: Rakesh Kumar
Date:
Subject: Re: Sync replication
Next
From: Alvaro Herrera
Date:
Subject: Re: OOM Killing on Docker while ANALYZE running