BUG #16414: Backend process blocked for 120s until kernel killed it - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #16414: Backend process blocked for 120s until kernel killed it
Date
Msg-id 16414-1eddf92b8ac2ed0a@postgresql.org
Whole thread Raw
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      16414
Logged by:          Montana Low
Email address:      montanalow@gmail.com
PostgreSQL version: 12.2
Operating system:   Ubuntu 18.04.3 LTS
Description:

A postgres server became unresponsive until reboot. This machine (and
several others in the cluster) typically handle bulk data ingestion via
COPY. Before the crash, there were many entries in the postgres logs similar
to:

2020-05-03 14:41:25.341 PDT [74797]  ERROR:  COPY from stdin failed: COPY
terminated by new PQexec
2020-05-03 14:41:25.344 PDT [87084]  ERROR:  COPY from stdin failed: COPY
terminated by new PQexec
...
2020-05-03 14:42:20.577 PDT [89620]  ERROR:  COPY from stdin failed: COPY
terminated by new PQexec

There was 1 statement timeout (set to 5 minutes) as the final log entry
before I manually initiated a server reboot.

2020-05-03 14:44:38.252 PDT [69974] ERROR:  canceling statement due to
statement timeout 

There were several interesting entries in journald, similar to this one.

May 03 14:48:06 ip-172-30-86-52 kernel: INFO: task postgres:91109 blocked
for more than 120 seconds.
May 03 14:48:06 ip-172-30-86-52 kernel:       Tainted: P           OE
4.15.0-1054-aws #56-Ubuntu
May 03 14:48:06 ip-172-30-86-52 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 03 14:48:06 ip-172-30-86-52 kernel: postgres        D    0 91109  91104
0x00000000
May 03 14:48:06 ip-172-30-86-52 kernel: Call Trace:
May 03 14:48:06 ip-172-30-86-52 kernel:  __schedule+0x24e/0x880
May 03 14:48:06 ip-172-30-86-52 kernel:  schedule+0x2c/0x80
May 03 14:48:06 ip-172-30-86-52 kernel:  io_schedule+0x16/0x40
May 03 14:48:06 ip-172-30-86-52 kernel:  wbt_wait+0x1e7/0x360
May 03 14:48:06 ip-172-30-86-52 kernel:  ?
trace_event_raw_event_wbt_step+0x140/0x140
May 03 14:48:06 ip-172-30-86-52 kernel:  blk_mq_make_request+0xe0/0x570
May 03 14:48:06 ip-172-30-86-52 kernel:  generic_make_request+0x124/0x300
May 03 14:48:06 ip-172-30-86-52 kernel:  submit_bio+0x73/0x140
May 03 14:48:06 ip-172-30-86-52 kernel:  ? _cond_resched+0x19/0x40
May 03 14:48:06 ip-172-30-86-52 kernel:  ? submit_bio+0x73/0x140
May 03 14:48:06 ip-172-30-86-52 kernel:  ? kmem_cache_alloc+0x158/0x1b0
May 03 14:48:06 ip-172-30-86-52 kernel:  ? ext4_init_io_end+0x1f/0x40
May 03 14:48:06 ip-172-30-86-52 kernel:  ext4_io_submit+0x4c/0x60
May 03 14:48:06 ip-172-30-86-52 kernel:  ext4_writepages+0x476/0xf50
May 03 14:48:06 ip-172-30-86-52 kernel:  ?
iov_iter_copy_from_user_atomic+0xc2/0x330
May 03 14:48:06 ip-172-30-86-52 kernel:  ?
__generic_file_write_iter+0x1aa/0x1d0
May 03 14:48:06 ip-172-30-86-52 kernel:  do_writepages+0x4b/0xe0
May 03 14:48:06 ip-172-30-86-52 kernel:  ?
ext4_mark_inode_dirty+0x1d0/0x1d0
May 03 14:48:06 ip-172-30-86-52 kernel:  ? do_writepages+0x4b/0xe0
May 03 14:48:06 ip-172-30-86-52 kernel:  ? new_sync_write+0xe7/0x140
May 03 14:48:06 ip-172-30-86-52 kernel:
__filemap_fdatawrite_range+0xc1/0x100
May 03 14:48:06 ip-172-30-86-52 kernel:  ?
__filemap_fdatawrite_range+0xc1/0x100
May 03 14:48:06 ip-172-30-86-52 kernel:  SyS_sync_file_range+0xef/0x130
May 03 14:48:06 ip-172-30-86-52 kernel:  do_syscall_64+0x73/0x130
May 03 14:48:06 ip-172-30-86-52 kernel:
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
May 03 14:48:06 ip-172-30-86-52 kernel: RIP: 0033:0x7f596fab97a7
May 03 14:48:06 ip-172-30-86-52 kernel: RSP: 002b:00007fff07b842a8 EFLAGS:
00000246 ORIG_RAX: 0000000000000115
May 03 14:48:06 ip-172-30-86-52 kernel: RAX: ffffffffffffffda RBX:
000000000000015e RCX: 00007f596fab97a7
May 03 14:48:06 ip-172-30-86-52 kernel: RDX: 0000000000002000 RSI:
00000000122ba000 RDI: 0000000000000055
May 03 14:48:06 ip-172-30-86-52 kernel: RBP: 00007fff07b842c0 R08:
0000000000000002 R09: 0000000000000200
May 03 14:48:06 ip-172-30-86-52 kernel: R10: 0000000000000002 R11:
0000000000000246 R12: 0000000000002000
May 03 14:48:06 ip-172-30-86-52 kernel: R13: 00000000122ba000 R14:
000000000a00000a R15: 0000000000000001

Any thoughts what went wrong, or where I should look for more data?


pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #16413: Broken Dependency pgadmin4-web to pgadmin4-python3-psutil
Next
From: "Suhonen Reijo (Valtori)"
Date:
Subject: VS: BUG #16404: PostgreSQL 12.2: Database restore does not accept theEEST time zone.