BUG #16414: Backend process blocked for 120s until kernel killed it - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #16414: Backend process blocked for 120s until kernel killed it |
Date | |
Msg-id | 16414-1eddf92b8ac2ed0a@postgresql.org Whole thread Raw |
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 16414 Logged by: Montana Low Email address: montanalow@gmail.com PostgreSQL version: 12.2 Operating system: Ubuntu 18.04.3 LTS Description: A postgres server became unresponsive until reboot. This machine (and several others in the cluster) typically handle bulk data ingestion via COPY. Before the crash, there were many entries in the postgres logs similar to: 2020-05-03 14:41:25.341 PDT [74797] ERROR: COPY from stdin failed: COPY terminated by new PQexec 2020-05-03 14:41:25.344 PDT [87084] ERROR: COPY from stdin failed: COPY terminated by new PQexec ... 2020-05-03 14:42:20.577 PDT [89620] ERROR: COPY from stdin failed: COPY terminated by new PQexec There was 1 statement timeout (set to 5 minutes) as the final log entry before I manually initiated a server reboot. 2020-05-03 14:44:38.252 PDT [69974] ERROR: canceling statement due to statement timeout There were several interesting entries in journald, similar to this one. May 03 14:48:06 ip-172-30-86-52 kernel: INFO: task postgres:91109 blocked for more than 120 seconds. May 03 14:48:06 ip-172-30-86-52 kernel: Tainted: P OE 4.15.0-1054-aws #56-Ubuntu May 03 14:48:06 ip-172-30-86-52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 03 14:48:06 ip-172-30-86-52 kernel: postgres D 0 91109 91104 0x00000000 May 03 14:48:06 ip-172-30-86-52 kernel: Call Trace: May 03 14:48:06 ip-172-30-86-52 kernel: __schedule+0x24e/0x880 May 03 14:48:06 ip-172-30-86-52 kernel: schedule+0x2c/0x80 May 03 14:48:06 ip-172-30-86-52 kernel: io_schedule+0x16/0x40 May 03 14:48:06 ip-172-30-86-52 kernel: wbt_wait+0x1e7/0x360 May 03 14:48:06 ip-172-30-86-52 kernel: ? trace_event_raw_event_wbt_step+0x140/0x140 May 03 14:48:06 ip-172-30-86-52 kernel: blk_mq_make_request+0xe0/0x570 May 03 14:48:06 ip-172-30-86-52 kernel: generic_make_request+0x124/0x300 May 03 14:48:06 ip-172-30-86-52 kernel: submit_bio+0x73/0x140 May 03 14:48:06 ip-172-30-86-52 kernel: ? _cond_resched+0x19/0x40 May 03 14:48:06 ip-172-30-86-52 kernel: ? submit_bio+0x73/0x140 May 03 14:48:06 ip-172-30-86-52 kernel: ? kmem_cache_alloc+0x158/0x1b0 May 03 14:48:06 ip-172-30-86-52 kernel: ? ext4_init_io_end+0x1f/0x40 May 03 14:48:06 ip-172-30-86-52 kernel: ext4_io_submit+0x4c/0x60 May 03 14:48:06 ip-172-30-86-52 kernel: ext4_writepages+0x476/0xf50 May 03 14:48:06 ip-172-30-86-52 kernel: ? iov_iter_copy_from_user_atomic+0xc2/0x330 May 03 14:48:06 ip-172-30-86-52 kernel: ? __generic_file_write_iter+0x1aa/0x1d0 May 03 14:48:06 ip-172-30-86-52 kernel: do_writepages+0x4b/0xe0 May 03 14:48:06 ip-172-30-86-52 kernel: ? ext4_mark_inode_dirty+0x1d0/0x1d0 May 03 14:48:06 ip-172-30-86-52 kernel: ? do_writepages+0x4b/0xe0 May 03 14:48:06 ip-172-30-86-52 kernel: ? new_sync_write+0xe7/0x140 May 03 14:48:06 ip-172-30-86-52 kernel: __filemap_fdatawrite_range+0xc1/0x100 May 03 14:48:06 ip-172-30-86-52 kernel: ? __filemap_fdatawrite_range+0xc1/0x100 May 03 14:48:06 ip-172-30-86-52 kernel: SyS_sync_file_range+0xef/0x130 May 03 14:48:06 ip-172-30-86-52 kernel: do_syscall_64+0x73/0x130 May 03 14:48:06 ip-172-30-86-52 kernel: entry_SYSCALL_64_after_hwframe+0x3d/0xa2 May 03 14:48:06 ip-172-30-86-52 kernel: RIP: 0033:0x7f596fab97a7 May 03 14:48:06 ip-172-30-86-52 kernel: RSP: 002b:00007fff07b842a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000115 May 03 14:48:06 ip-172-30-86-52 kernel: RAX: ffffffffffffffda RBX: 000000000000015e RCX: 00007f596fab97a7 May 03 14:48:06 ip-172-30-86-52 kernel: RDX: 0000000000002000 RSI: 00000000122ba000 RDI: 0000000000000055 May 03 14:48:06 ip-172-30-86-52 kernel: RBP: 00007fff07b842c0 R08: 0000000000000002 R09: 0000000000000200 May 03 14:48:06 ip-172-30-86-52 kernel: R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000002000 May 03 14:48:06 ip-172-30-86-52 kernel: R13: 00000000122ba000 R14: 000000000a00000a R15: 0000000000000001 Any thoughts what went wrong, or where I should look for more data?
pgsql-bugs by date: