Home > mailing lists

Many Backends stuck in wait event IPC/ParallelFinish - Mailing list pgsql-general

From	Steven Winfield
Subject	Many Backends stuck in wait event IPC/ParallelFinish
Date	January 30, 2018 23:01:30
Msg-id	E9FA92C2921F31408041863B74EE4C2001A479E590@CCPMAILDAG03.cantab.local Whole thread Raw
List	pgsql-general

Tree view

Hi,

We just had an incident on one of our non-production databases where 14 unrelated queries were all hung in wait event IPC / ParallelFinish. We had systematically called pg_cancel/terminate_backend on all other backends except these (and the autovacuum process mentioned below) to make sure there wasn’t some other resource that they were deadlocked on.

We attached gdb to a number of the backends, and found their backtraces to look like this:

#0 0x00007f9ea3e77903 in __epoll_wait_nocancel () from /lib64/libc.so.6

#1 0x000000000077cb5e in WaitEventSetWait ()

#2 0x000000000077d149 in WaitLatch ()

#3 0x00000000004f1d75 in WaitForParallelWorkersToFinish ()

#4 0x00000000006294e7 in ExecParallelFinish ()

#5 0x000000000063a57d in ExecShutdownGather ()

…

#6 0x0000000000629978 in ExecShutdownNode () <-- Then zero or more of

#7 0x0000000000676c01 in planstate_tree_walker () <-- this pair

…

#10 0x0000000000629925 in ExecShutdownNode ()

#11 0x000000000062494e in standard_ExecutorRun ()

#12 0x00007f9e99d73f5d in pgss_ExecutorRun () from /remote/install/sw/external/20180117-4-64/lib/pg_stat_statements.so

#13 0x00000000007a5c24 in PortalRunSelect ()

#14 0x00000000007a7316 in PortalRun ()

#15 0x00000000007a2b49 in exec_simple_query ()

#16 0x00000000007a4157 in PostgresMain ()

#17 0x000000000047926f in ServerLoop ()

#18 0x00000000007200cc in PostmasterMain ()

#19 0x000000000047af97 in main ()

We also sent one of the backends a SIGABRT, so we have a core dump to play with. The only other backend running at the time was an autovacuum process, which may also have been hung - it didn’t have a wait event in pg_stat_activity, but I didn’t get a chance to strace it or attach gdb as the database restarted itself after we sent the SIGABRT.

The host is running Postgres v10.1 on RHEL7.4.

Any ideas what could have caused this, or what we could do to investigate this further?

Thanks,

Steve.

pgsql-general by date:

From: Poul Kristensen
Date: 30 January 2018, 22:50:09
Subject: Re: PostgreSQL Kerberos Authentication

From: Adrian Klaver
Date: 30 January 2018, 23:07:06
Subject: Re: pgAdmin 4 loading shapefiles

Many Backends stuck in wait event IPC/ParallelFinish - Mailing list pgsql-general

Previous

Next