Thread: OOM Killer kills PostgreSQL

OOM Killer kills PostgreSQL

From
Piotr Włodarczyk
Date:
Hi folks,

We met unexpected PostgreSQL shutdown. After a little investigation we've discovered that problem is in OOM killer which kills our PostgreSQL. Unfortunately we can't find query on DB causing this problem. Log is as below:

May 05 09:05:33 HOST kernel: postgres invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=-1000
May 05 09:05:34 HOST kernel: postgres cpuset=/ mems_allowed=0
May 05 09:05:34 HOST kernel: CPU: 0 PID: 28286 Comm: postgres Not tainted 3.10.0-1127.el7.x86_64 #1
May 05 09:05:34 HOST kernel: Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
May 05 09:05:34 HOST kernel: Call Trace:
May 05 09:05:34 HOST kernel:  [<ffffffffa097ff85>] dump_stack+0x19/0x1b
May 05 09:05:34 HOST kernel:  [<ffffffffa097a8a3>] dump_header+0x90/0x229
May 05 09:05:34 HOST kernel:  [<ffffffffa050da5b>] ? cred_has_capability+0x6b/0x120
May 05 09:05:34 HOST kernel:  [<ffffffffa03c246e>] oom_kill_process+0x25e/0x3f0
May 05 09:05:35 HOST kernel:  [<ffffffffa0333a41>] ? cpuset_mems_allowed_intersects+0x21/0x30
May 05 09:05:40 HOST kernel:  [<ffffffffa03c1ecd>] ? oom_unkillable_task+0xcd/0x120
May 05 09:05:42 HOST kernel:  [<ffffffffa03c1f76>] ? find_lock_task_mm+0x56/0xc0
May 05 09:05:42 HOST kernel:  [<ffffffffa03c2cc6>] out_of_memory+0x4b6/0x4f0
May 05 09:05:42 HOST kernel:  [<ffffffffa097b3c0>] __alloc_pages_slowpath+0x5db/0x729
May 05 09:05:42 HOST kernel:  [<ffffffffa03c9146>] __alloc_pages_nodemask+0x436/0x450
May 05 09:05:42 HOST kernel:  [<ffffffffa0418e18>] alloc_pages_current+0x98/0x110
May 05 09:05:42 HOST kernel:  [<ffffffffa03be377>] __page_cache_alloc+0x97/0xb0
May 05 09:05:42 HOST kernel:  [<ffffffffa03c0f30>] filemap_fault+0x270/0x420
May 05 09:05:42 HOST kernel:  [<ffffffffc03c07d6>] ext4_filemap_fault+0x36/0x50 [ext4]
May 05 09:05:42 HOST kernel:  [<ffffffffa03edeea>] __do_fault.isra.61+0x8a/0x100
May 05 09:05:42 HOST kernel:  [<ffffffffa03ee49c>] do_read_fault.isra.63+0x4c/0x1b0
May 05 09:05:42 HOST kernel:  [<ffffffffa03f5d00>] handle_mm_fault+0xa20/0xfb0
May 05 09:05:42 HOST kernel:  [<ffffffffa098d653>] __do_page_fault+0x213/0x500
May 05 09:05:42 HOST kernel:  [<ffffffffa098da26>] trace_do_page_fault+0x56/0x150
May 05 09:05:42 HOST kernel:  [<ffffffffa098cfa2>] do_async_page_fault+0x22/0xf0
May 05 09:05:42 HOST kernel:  [<ffffffffa09897a8>] async_page_fault+0x28/0x30
May 05 09:05:42 HOST kernel: Mem-Info:
May 05 09:05:42 HOST kernel: active_anon:5382083 inactive_anon:514069 isolated_anon:0
                                                active_file:653 inactive_file:412 isolated_file:75
                                                unevictable:0 dirty:0 writeback:0 unstable:0
                                                slab_reclaimable:120624 slab_unreclaimable:14538
                                                mapped:814755 shmem:816586 pagetables:60496 bounce:0
                                                free:30218 free_pcp:562 free_cma:0

Can You tell me how to find problematic query? Or how to "pimp" configuration to let db be alive and let us find problematic query?

-- 

Pozdrawiam
Piotr Włodarczyk

Re: OOM Killer kills PostgreSQL

From
Laurenz Albe
Date:
On Wed, 2020-05-20 at 09:30 +0200, Piotr Włodarczyk wrote:
> We met unexpected PostgreSQL shutdown. After a little investigation
> we've discovered that problem is in OOM killer which kills our PostgreSQL.
> Unfortunately we can't find query on DB causing this problem. Log is as below:

Is there nothing in the PostgreSQL log?

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: OOM Killer kills PostgreSQL

From
Piotr Włodarczyk
Date:
Nothing special. I'll check it agin after next dead

On Wed, May 20, 2020 at 10:22 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Wed, 2020-05-20 at 09:30 +0200, Piotr Włodarczyk wrote:
> We met unexpected PostgreSQL shutdown. After a little investigation
> we've discovered that problem is in OOM killer which kills our PostgreSQL.
> Unfortunately we can't find query on DB causing this problem. Log is as below:

Is there nothing in the PostgreSQL log?

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



--

Pozdrawiam
Piotr Włodarczyk

Re: OOM Killer kills PostgreSQL

From
Fabio Pardi
Date:
Maybe your memory budget does not meet the RAM on the machine?

The problem is not in the query you are looking for, but in the settings you are using for Postgres.

regards,

fabio pardi



On 20/05/2020 09:30, Piotr Włodarczyk wrote:
Hi folks,

We met unexpected PostgreSQL shutdown. After a little investigation we've discovered that problem is in OOM killer which kills our PostgreSQL. Unfortunately we can't find query on DB causing this problem. Log is as below:

May 05 09:05:33 HOST kernel: postgres invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=-1000
May 05 09:05:34 HOST kernel: postgres cpuset=/ mems_allowed=0
May 05 09:05:34 HOST kernel: CPU: 0 PID: 28286 Comm: postgres Not tainted 3.10.0-1127.el7.x86_64 #1
May 05 09:05:34 HOST kernel: Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
May 05 09:05:34 HOST kernel: Call Trace:
May 05 09:05:34 HOST kernel:  [<ffffffffa097ff85>] dump_stack+0x19/0x1b
May 05 09:05:34 HOST kernel:  [<ffffffffa097a8a3>] dump_header+0x90/0x229
May 05 09:05:34 HOST kernel:  [<ffffffffa050da5b>] ? cred_has_capability+0x6b/0x120
May 05 09:05:34 HOST kernel:  [<ffffffffa03c246e>] oom_kill_process+0x25e/0x3f0
May 05 09:05:35 HOST kernel:  [<ffffffffa0333a41>] ? cpuset_mems_allowed_intersects+0x21/0x30
May 05 09:05:40 HOST kernel:  [<ffffffffa03c1ecd>] ? oom_unkillable_task+0xcd/0x120
May 05 09:05:42 HOST kernel:  [<ffffffffa03c1f76>] ? find_lock_task_mm+0x56/0xc0
May 05 09:05:42 HOST kernel:  [<ffffffffa03c2cc6>] out_of_memory+0x4b6/0x4f0
May 05 09:05:42 HOST kernel:  [<ffffffffa097b3c0>] __alloc_pages_slowpath+0x5db/0x729
May 05 09:05:42 HOST kernel:  [<ffffffffa03c9146>] __alloc_pages_nodemask+0x436/0x450
May 05 09:05:42 HOST kernel:  [<ffffffffa0418e18>] alloc_pages_current+0x98/0x110
May 05 09:05:42 HOST kernel:  [<ffffffffa03be377>] __page_cache_alloc+0x97/0xb0
May 05 09:05:42 HOST kernel:  [<ffffffffa03c0f30>] filemap_fault+0x270/0x420
May 05 09:05:42 HOST kernel:  [<ffffffffc03c07d6>] ext4_filemap_fault+0x36/0x50 [ext4]
May 05 09:05:42 HOST kernel:  [<ffffffffa03edeea>] __do_fault.isra.61+0x8a/0x100
May 05 09:05:42 HOST kernel:  [<ffffffffa03ee49c>] do_read_fault.isra.63+0x4c/0x1b0
May 05 09:05:42 HOST kernel:  [<ffffffffa03f5d00>] handle_mm_fault+0xa20/0xfb0
May 05 09:05:42 HOST kernel:  [<ffffffffa098d653>] __do_page_fault+0x213/0x500
May 05 09:05:42 HOST kernel:  [<ffffffffa098da26>] trace_do_page_fault+0x56/0x150
May 05 09:05:42 HOST kernel:  [<ffffffffa098cfa2>] do_async_page_fault+0x22/0xf0
May 05 09:05:42 HOST kernel:  [<ffffffffa09897a8>] async_page_fault+0x28/0x30
May 05 09:05:42 HOST kernel: Mem-Info:
May 05 09:05:42 HOST kernel: active_anon:5382083 inactive_anon:514069 isolated_anon:0
                                                active_file:653 inactive_file:412 isolated_file:75
                                                unevictable:0 dirty:0 writeback:0 unstable:0
                                                slab_reclaimable:120624 slab_unreclaimable:14538
                                                mapped:814755 shmem:816586 pagetables:60496 bounce:0
                                                free:30218 free_pcp:562 free_cma:0

Can You tell me how to find problematic query? Or how to "pimp" configuration to let db be alive and let us find problematic query?

-- 

Pozdrawiam
Piotr Włodarczyk

Re: OOM Killer kills PostgreSQL

From
Justin Pryzby
Date:
What postgres version ?  What environment (RAM) and config ?
https://wiki.postgresql.org/wiki/Server_Configuration

I think you can probably find more info in dmesg/syslog ; probably a
line saying "OOM killed ..." showing which PID and its vsz.

Are you able to see some particular process continuously growing (like
in top or ps) ?

Do you have full query logs enabled to help determine which pid/query
was involved ?
log_statement=all log_min_messages=info log_checkpoints=on
log_lock_waits=on log_temp_files=0



On Wed, May 20, 2020 at 2:31 AM Piotr Włodarczyk
<piotrwlodarczyk89@gmail.com> wrote:
>
> Hi folks,
>
> We met unexpected PostgreSQL shutdown. After a little investigation we've discovered that problem is in OOM killer
whichkills our PostgreSQL. Unfortunately we can't find query on DB causing this problem. Log is as below: 
>
> May 05 09:05:33 HOST kernel: postgres invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=-1000
> May 05 09:05:34 HOST kernel: postgres cpuset=/ mems_allowed=0
> May 05 09:05:34 HOST kernel: CPU: 0 PID: 28286 Comm: postgres Not tainted 3.10.0-1127.el7.x86_64 #1
> May 05 09:05:34 HOST kernel: Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> May 05 09:05:34 HOST kernel: Call Trace:
> May 05 09:05:34 HOST kernel:  [<ffffffffa097ff85>] dump_stack+0x19/0x1b
> May 05 09:05:34 HOST kernel:  [<ffffffffa097a8a3>] dump_header+0x90/0x229
> May 05 09:05:34 HOST kernel:  [<ffffffffa050da5b>] ? cred_has_capability+0x6b/0x120
> May 05 09:05:34 HOST kernel:  [<ffffffffa03c246e>] oom_kill_process+0x25e/0x3f0
> May 05 09:05:35 HOST kernel:  [<ffffffffa0333a41>] ? cpuset_mems_allowed_intersects+0x21/0x30
> May 05 09:05:40 HOST kernel:  [<ffffffffa03c1ecd>] ? oom_unkillable_task+0xcd/0x120
> May 05 09:05:42 HOST kernel:  [<ffffffffa03c1f76>] ? find_lock_task_mm+0x56/0xc0
> May 05 09:05:42 HOST kernel:  [<ffffffffa03c2cc6>] out_of_memory+0x4b6/0x4f0
> May 05 09:05:42 HOST kernel:  [<ffffffffa097b3c0>] __alloc_pages_slowpath+0x5db/0x729
> May 05 09:05:42 HOST kernel:  [<ffffffffa03c9146>] __alloc_pages_nodemask+0x436/0x450
> May 05 09:05:42 HOST kernel:  [<ffffffffa0418e18>] alloc_pages_current+0x98/0x110
> May 05 09:05:42 HOST kernel:  [<ffffffffa03be377>] __page_cache_alloc+0x97/0xb0
> May 05 09:05:42 HOST kernel:  [<ffffffffa03c0f30>] filemap_fault+0x270/0x420
> May 05 09:05:42 HOST kernel:  [<ffffffffc03c07d6>] ext4_filemap_fault+0x36/0x50 [ext4]
> May 05 09:05:42 HOST kernel:  [<ffffffffa03edeea>] __do_fault.isra.61+0x8a/0x100
> May 05 09:05:42 HOST kernel:  [<ffffffffa03ee49c>] do_read_fault.isra.63+0x4c/0x1b0
> May 05 09:05:42 HOST kernel:  [<ffffffffa03f5d00>] handle_mm_fault+0xa20/0xfb0
> May 05 09:05:42 HOST kernel:  [<ffffffffa098d653>] __do_page_fault+0x213/0x500
> May 05 09:05:42 HOST kernel:  [<ffffffffa098da26>] trace_do_page_fault+0x56/0x150
> May 05 09:05:42 HOST kernel:  [<ffffffffa098cfa2>] do_async_page_fault+0x22/0xf0
> May 05 09:05:42 HOST kernel:  [<ffffffffa09897a8>] async_page_fault+0x28/0x30
> May 05 09:05:42 HOST kernel: Mem-Info:
> May 05 09:05:42 HOST kernel: active_anon:5382083 inactive_anon:514069 isolated_anon:0
>                                                 active_file:653 inactive_file:412 isolated_file:75
>                                                 unevictable:0 dirty:0 writeback:0 unstable:0
>                                                 slab_reclaimable:120624 slab_unreclaimable:14538
>                                                 mapped:814755 shmem:816586 pagetables:60496 bounce:0
>                                                 free:30218 free_pcp:562 free_cma:0
>
> Can You tell me how to find problematic query? Or how to "pimp" configuration to let db be alive and let us find
problematicquery? 
>
> --
>
> Pozdrawiam
> Piotr Włodarczyk



Re: OOM Killer kills PostgreSQL

From
Stephen Frost
Date:
Greetings,

* Piotr Włodarczyk (piotrwlodarczyk89@gmail.com) wrote:
> We met unexpected PostgreSQL shutdown. After a little investigation we've
> discovered that problem is in OOM killer which kills our PostgreSQL.

You need to configure your system to not overcommit.

Read up on overcommit_ratio and overcommit_memory Linux settings.

Thanks,

Stephen

Attachment