Thread: Missed condition-variable wakeups on FreeBSD

Missed condition-variable wakeups on FreeBSD

From
Tom Lane
Date:
About once a month over the last six months, my buildfarm animal
florican has gotten stuck while running the core regression tests.
The symptoms have looked very much the same each time: there is
a backend with two parallel worker processes that are just sitting
and not consuming any CPU time.  Each time I've attached to these
processes with gdb to check their stack traces, and seen pretty
much the same story every time (traces below).  What is really
interesting is that after I detach from the second worker, the
processes resume running and finish out the test successfully!
I don't know much about how gdb interacts with kernel calls on
FreeBSD, but I speculate that the poll(2) call returns with EINTR
after gdb releases the process, and then things resume fine,
suggesting that we lost an interrupt somewhere.

I have observed this three times in the REL_11 branch, once
in REL_12, and a couple of times last summer before it occurred
to me to start keeping notes.  Over that time the machine has
been running various patchlevels of FreeBSD 13.0.

Here's the stack trace from the leader process in the most
recent event (REL_11 as of yesterday).  It's not always the
same query that gets stuck, but it's always a parallel hash join:

(gdb) p debug_query_string
$1 = 0x21873090 "select count(*) from simple r join simple s using (id);"
(gdb) bt
#0  _poll () at _poll.S:4
#1  0x21701361 in __thr_poll (fds=0x219dc170, nfds=2, timeout=-1) at /usr/src/lib/libthr/thread/thr_syscalls.c:338
#2  0x215eaf3f in poll (pfd=0x219dc170, nfds=2, timeout=-1) at /usr/src/lib/libc/sys/poll.c:47
#3  0x0097b0fd in WaitEventSetWaitBlock (set=<optimized out>, cur_timeout=-1, occurred_events=<optimized out>,
nevents=<optimizedout>) at latch.c:1171 
#4  WaitEventSetWait (set=0x219dc138, timeout=-1, occurred_events=<optimized out>, nevents=1,
wait_event_info=134217738)at latch.c:1000 
#5  0x0099842b in ConditionVariableSleep (cv=0x2bf0b230, wait_event_info=134217738) at condition_variable.c:157
#6  0x00977e12 in BarrierArriveAndWait (barrier=0x2bf0b218, wait_event_info=134217738) at barrier.c:191
#7  0x00873441 in ExecHashJoinImpl (pstate=<optimized out>, parallel=true) at nodeHashjoin.c:328
#8  ExecParallelHashJoin (pstate=0x2a887740) at nodeHashjoin.c:600
#9  0x0086a509 in ExecProcNode (node=0x2a887740) at ../../../src/include/executor/executor.h:248
#10 fetch_input_tuple (aggstate=aggstate@entry=0x2a887500) at nodeAgg.c:407
#11 0x00869646 in agg_retrieve_direct (aggstate=<optimized out>) at nodeAgg.c:1746
#12 ExecAgg (pstate=0x2a887500) at nodeAgg.c:1561
#13 0x0086e181 in ExecProcNode (node=0x2a887500) at ../../../src/include/executor/executor.h:248
#14 gather_getnext (gatherstate=0x2a887414) at nodeGather.c:276
#15 ExecGather (pstate=0x2a887414) at nodeGather.c:207
#16 0x0086a509 in ExecProcNode (node=0x2a887414) at ../../../src/include/executor/executor.h:248
#17 fetch_input_tuple (aggstate=aggstate@entry=0x2a8871b8) at nodeAgg.c:407
#18 0x00869646 in agg_retrieve_direct (aggstate=<optimized out>) at nodeAgg.c:1746
#19 ExecAgg (pstate=0x2a8871b8) at nodeAgg.c:1561
#20 0x0085956b in ExecProcNode (node=0x2a8871b8) at ../../../src/include/executor/executor.h:248
#21 ExecutePlan (estate=0x2a887090, planstate=0x2a8871b8, use_parallel_mode=<optimized out>, operation=CMD_SELECT,
sendTuples=<optimizedout>,  
    numberTuples=<optimized out>, direction=<optimized out>, dest=0x2ab629b4, execute_once=<optimized out>) at
execMain.c:1712
#22 standard_ExecutorRun (queryDesc=0x218a1490, direction=ForwardScanDirection, count=0, execute_once=<optimized out>)
atexecMain.c:353 
#23 0x00859445 in ExecutorRun (queryDesc=0x218a1490, direction=ForwardScanDirection, count=0, execute_once=<optimized
out>)at execMain.c:296 
#24 0x009a3d57 in PortalRunSelect (portal=portal@entry=0x2197f090, forward=<optimized out>, count=0, dest=0x2ab629b4)
atpquery.c:941 
#25 0x009a39d6 in PortalRun (portal=0x2197f090, count=2147483647, isTopLevel=<optimized out>, run_once=<optimized out>,
dest=0x2ab629b4,altdest=0x2ab629b4,  
    completionTag=0xffbfdc70 "") at pquery.c:782
#26 0x009a2b53 in exec_simple_query (query_string=query_string@entry=0x21873090 "select count(*) from simple r join
simples using (id);") at postgres.c:1145 
#27 0x009a04ec in PostgresMain (argc=1, argv=0x21954ed4, dbname=<optimized out>, username=0x21954d38 "buildfarm") at
postgres.c:4052

Worker 1 looks like this:

(gdb) bt
#0  _poll () at _poll.S:4
#1  0x21701361 in __thr_poll (fds=0x2187cb48, nfds=2, timeout=-1) at /usr/src/lib/libthr/thread/thr_syscalls.c:338
#2  0x215eaf3f in poll (pfd=0x2187cb48, nfds=2, timeout=-1) at /usr/src/lib/libc/sys/poll.c:47
#3  0x0097b0fd in WaitEventSetWaitBlock (set=<optimized out>, cur_timeout=-1, occurred_events=<optimized out>,
nevents=<optimizedout>) at latch.c:1171 
#4  WaitEventSetWait (set=0x2187cb10, timeout=-1, occurred_events=<optimized out>, nevents=1,
wait_event_info=134217738)at latch.c:1000 
#5  0x0099842b in ConditionVariableSleep (cv=0x2196b230, wait_event_info=134217738) at condition_variable.c:157
#6  0x00977e12 in BarrierArriveAndWait (barrier=0x2196b218, wait_event_info=134217738) at barrier.c:191
#7  0x00873441 in ExecHashJoinImpl (pstate=<optimized out>, parallel=true) at nodeHashjoin.c:328
#8  ExecParallelHashJoin (pstate=0x2a8378d8) at nodeHashjoin.c:600
#9  0x0086a509 in ExecProcNode (node=0x2a8378d8) at ../../../src/include/executor/executor.h:248
#10 fetch_input_tuple (aggstate=aggstate@entry=0x2a8376c8) at nodeAgg.c:407
#11 0x00869646 in agg_retrieve_direct (aggstate=<optimized out>) at nodeAgg.c:1746
#12 ExecAgg (pstate=0x2a8376c8) at nodeAgg.c:1561
#13 0x0085956b in ExecProcNode (node=0x2a8376c8) at ../../../src/include/executor/executor.h:248
#14 ExecutePlan (estate=0x2a837090, planstate=0x2a8376c8, use_parallel_mode=<optimized out>, operation=CMD_SELECT,
sendTuples=<optimizedout>,  
    numberTuples=<optimized out>, direction=<optimized out>, dest=0x219ac780, execute_once=<optimized out>) at
execMain.c:1712
#15 standard_ExecutorRun (queryDesc=0x219ad3e0, direction=ForwardScanDirection, count=0, execute_once=<optimized out>)
atexecMain.c:353 
#16 0x00859445 in ExecutorRun (queryDesc=0x219ad3e0, direction=ForwardScanDirection, count=0, execute_once=<optimized
out>)at execMain.c:296 
#17 0x0085d6de in ParallelQueryMain (seg=0x2187c89c, toc=0x2196b000) at execParallel.c:1402
#18 0x00750ee0 in ParallelWorkerMain (main_arg=1903254317) at parallel.c:1433

Worker 2 looks like this:

(gdb) bt
#0  _poll () at _poll.S:4
#1  0x21701361 in __thr_poll (fds=0x2187cb48, nfds=2, timeout=-1) at /usr/src/lib/libthr/thread/thr_syscalls.c:338
#2  0x215eaf3f in poll (pfd=0x2187cb48, nfds=2, timeout=-1) at /usr/src/lib/libc/sys/poll.c:47
#3  0x0097b0fd in WaitEventSetWaitBlock (set=<optimized out>, cur_timeout=-1, occurred_events=<optimized out>,
nevents=<optimizedout>) at latch.c:1171 
#4  WaitEventSetWait (set=0x2187cb10, timeout=-1, occurred_events=<optimized out>, nevents=1,
wait_event_info=134217737)at latch.c:1000 
#5  0x0099842b in ConditionVariableSleep (cv=0x2196b230, wait_event_info=134217737) at condition_variable.c:157
#6  0x00977e12 in BarrierArriveAndWait (barrier=0x2196b218, wait_event_info=134217737) at barrier.c:191
#7  0x0086ea44 in MultiExecParallelHash (node=<optimized out>) at nodeHash.c:312
#8  MultiExecHash (node=0x2a837e88) at nodeHash.c:112
#9  0x0085fa03 in MultiExecProcNode (node=0x2a837e88) at execProcnode.c:501
#10 0x00872f80 in ExecHashJoinImpl (pstate=<optimized out>, parallel=true) at nodeHashjoin.c:290
#11 ExecParallelHashJoin (pstate=0x2a8378d8) at nodeHashjoin.c:600
#12 0x0086a509 in ExecProcNode (node=0x2a8378d8) at ../../../src/include/executor/executor.h:248
#13 fetch_input_tuple (aggstate=aggstate@entry=0x2a8376c8) at nodeAgg.c:407
#14 0x00869646 in agg_retrieve_direct (aggstate=<optimized out>) at nodeAgg.c:1746
#15 ExecAgg (pstate=0x2a8376c8) at nodeAgg.c:1561
#16 0x0085956b in ExecProcNode (node=0x2a8376c8) at ../../../src/include/executor/executor.h:248
#17 ExecutePlan (estate=0x2a837090, planstate=0x2a8376c8, use_parallel_mode=<optimized out>, operation=CMD_SELECT,
sendTuples=<optimizedout>,  
    numberTuples=<optimized out>, direction=<optimized out>, dest=0x219ac780, execute_once=<optimized out>) at
execMain.c:1712
#18 standard_ExecutorRun (queryDesc=0x219ad3e0, direction=ForwardScanDirection, count=0, execute_once=<optimized out>)
atexecMain.c:353 
#19 0x00859445 in ExecutorRun (queryDesc=0x219ad3e0, direction=ForwardScanDirection, count=0, execute_once=<optimized
out>)at execMain.c:296 
#20 0x0085d6de in ParallelQueryMain (seg=0x2187c89c, toc=0x2196b000) at execParallel.c:1402
#21 0x00750ee0 in ParallelWorkerMain (main_arg=1903254317) at parallel.c:1433
#22 0x009193d3 in StartBackgroundWorker () at bgworker.c:911

In each case I attached to the process, ran the gdb commands shown
above, and detached.  As I've grown used to seeing, the processes
resumed running after I detached from the second worker.  I've not
tried attaching in something other than PID order, but I probably
will next time.

Thoughts?  Ideas on debugging this?

            regards, tom lane



Re: Missed condition-variable wakeups on FreeBSD

From
Andres Freund
Date:
Hi,

On 2022-02-26 14:07:05 -0500, Tom Lane wrote:
> About once a month over the last six months, my buildfarm animal
> florican has gotten stuck while running the core regression tests.
> The symptoms have looked very much the same each time: there is
> a backend with two parallel worker processes that are just sitting
> and not consuming any CPU time.  Each time I've attached to these
> processes with gdb to check their stack traces, and seen pretty
> much the same story every time (traces below).  What is really
> interesting is that after I detach from the second worker, the
> processes resume running and finish out the test successfully!
> I don't know much about how gdb interacts with kernel calls on
> FreeBSD, but I speculate that the poll(2) call returns with EINTR
> after gdb releases the process, and then things resume fine,
> suggesting that we lost an interrupt somewhere.
>
> I have observed this three times in the REL_11 branch, once
> in REL_12, and a couple of times last summer before it occurred
> to me to start keeping notes.  Over that time the machine has
> been running various patchlevels of FreeBSD 13.0.

It's certainly interesting that it appears to happen only in the branches
using poll rather than kqueue to implement latches. That changed between 12
and 13.

Have you tried running the core regression tests with force_parallel_mode =
on, or with the parallel costs lowered, to see if that makes the problem
appear more often?


> Here's the stack trace from the leader process in the most
> recent event (REL_11 as of yesterday).  It's not always the
> same query that gets stuck, but it's always a parallel hash join:

Which does make me wonder if it's a latch issue or a logic issue in hashjoin /
barriers. HJ is the only user of barrier.c...


There's this commit, which subsequently was reverted due to some issue, and
not re-applied since...

commit 4e0f0995e923948631c4114ab353b256b51b58ad
Author: Thomas Munro <tmunro@postgresql.org>
Date:   2021-03-17 17:46:39 +1300
 
    Fix race in Parallel Hash Join batch cleanup.

It doesn't seem super likely to be related, but...


> (gdb) p debug_query_string
> $1 = 0x21873090 "select count(*) from simple r join simple s using (id);"
> (gdb) bt
> #0  _poll () at _poll.S:4
> #1  0x21701361 in __thr_poll (fds=0x219dc170, nfds=2, timeout=-1) at /usr/src/lib/libthr/thread/thr_syscalls.c:338
> #2  0x215eaf3f in poll (pfd=0x219dc170, nfds=2, timeout=-1) at /usr/src/lib/libc/sys/poll.c:47
> #3  0x0097b0fd in WaitEventSetWaitBlock (set=<optimized out>, cur_timeout=-1, occurred_events=<optimized out>,
nevents=<optimizedout>) at latch.c:1171
 

The next time this happens / if you still have this open, perhaps it could be
worth checking if there's a byte in the self pipe?


> Thoughts?  Ideas on debugging this?

Besides trying to make the issue more likely as suggested above, it might be
worth checking if signalling the stuck processes with SIGUSR1 gets them
unstuck.

Greetings,

Andres Freund



Re: Missed condition-variable wakeups on FreeBSD

From
Justin Pryzby
Date:
On Sat, Feb 26, 2022 at 02:07:05PM -0500, Tom Lane wrote:
> I don't know much about how gdb interacts with kernel calls on
> FreeBSD, but I speculate that the poll(2) call returns with EINTR
> after gdb releases the process, and then things resume fine,
> suggesting that we lost an interrupt somewhere.

I've seen some similar interactions with strace under linux causing a process
to be woken up or otherwise incur a different behavior (not necessarily
postgres).  

> Thoughts?  Ideas on debugging this?

Before attaching a debugger, figure out what syscall each process is in.
In linux, that's ps O wchan PID.

>> Besides trying to make the issue more likely as suggested above, it might be
>> worth checking if signalling the stuck processes with SIGUSR1 gets them
>> unstuck.

And SIGCONT.

Maybe already did this, but you can dump a corefile of the running processes to
allow future inspection.

-- 
Justin



Re: Missed condition-variable wakeups on FreeBSD

From
Melanie Plageman
Date:
On Sat, Feb 26, 2022 at 2:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> About once a month over the last six months, my buildfarm animal
> florican has gotten stuck while running the core regression tests.
> The symptoms have looked very much the same each time: there is
> a backend with two parallel worker processes that are just sitting
> and not consuming any CPU time.  Each time I've attached to these
> processes with gdb to check their stack traces, and seen pretty
> much the same story every time (traces below).  What is really
> interesting is that after I detach from the second worker, the
> processes resume running and finish out the test successfully!
> I don't know much about how gdb interacts with kernel calls on
> FreeBSD, but I speculate that the poll(2) call returns with EINTR
> after gdb releases the process, and then things resume fine,
> suggesting that we lost an interrupt somewhere.
>
> I have observed this three times in the REL_11 branch, once
> in REL_12, and a couple of times last summer before it occurred
> to me to start keeping notes.  Over that time the machine has
> been running various patchlevels of FreeBSD 13.0.
>
> Here's the stack trace from the leader process in the most
> recent event (REL_11 as of yesterday).  It's not always the
> same query that gets stuck, but it's always a parallel hash join:
>
> (gdb) p debug_query_string
> $1 = 0x21873090 "select count(*) from simple r join simple s using (id);"
> (gdb) bt
> #0  _poll () at _poll.S:4
> #1  0x21701361 in __thr_poll (fds=0x219dc170, nfds=2, timeout=-1) at /usr/src/lib/libthr/thread/thr_syscalls.c:338
> #2  0x215eaf3f in poll (pfd=0x219dc170, nfds=2, timeout=-1) at /usr/src/lib/libc/sys/poll.c:47
> #3  0x0097b0fd in WaitEventSetWaitBlock (set=<optimized out>, cur_timeout=-1, occurred_events=<optimized out>,
nevents=<optimizedout>) at latch.c:1171
 
> #4  WaitEventSetWait (set=0x219dc138, timeout=-1, occurred_events=<optimized out>, nevents=1,
wait_event_info=134217738)at latch.c:1000
 
> #5  0x0099842b in ConditionVariableSleep (cv=0x2bf0b230, wait_event_info=134217738) at condition_variable.c:157
> #6  0x00977e12 in BarrierArriveAndWait (barrier=0x2bf0b218, wait_event_info=134217738) at barrier.c:191
> #7  0x00873441 in ExecHashJoinImpl (pstate=<optimized out>, parallel=true) at nodeHashjoin.c:328
> #8  ExecParallelHashJoin (pstate=0x2a887740) at nodeHashjoin.c:600
> #9  0x0086a509 in ExecProcNode (node=0x2a887740) at ../../../src/include/executor/executor.h:248
> #10 fetch_input_tuple (aggstate=aggstate@entry=0x2a887500) at nodeAgg.c:407
> #11 0x00869646 in agg_retrieve_direct (aggstate=<optimized out>) at nodeAgg.c:1746
> #12 ExecAgg (pstate=0x2a887500) at nodeAgg.c:1561
> #13 0x0086e181 in ExecProcNode (node=0x2a887500) at ../../../src/include/executor/executor.h:248
> #14 gather_getnext (gatherstate=0x2a887414) at nodeGather.c:276
> #15 ExecGather (pstate=0x2a887414) at nodeGather.c:207
> #16 0x0086a509 in ExecProcNode (node=0x2a887414) at ../../../src/include/executor/executor.h:248
> #17 fetch_input_tuple (aggstate=aggstate@entry=0x2a8871b8) at nodeAgg.c:407
> #18 0x00869646 in agg_retrieve_direct (aggstate=<optimized out>) at nodeAgg.c:1746
> #19 ExecAgg (pstate=0x2a8871b8) at nodeAgg.c:1561
> #20 0x0085956b in ExecProcNode (node=0x2a8871b8) at ../../../src/include/executor/executor.h:248
> #21 ExecutePlan (estate=0x2a887090, planstate=0x2a8871b8, use_parallel_mode=<optimized out>, operation=CMD_SELECT,
sendTuples=<optimizedout>,
 
>     numberTuples=<optimized out>, direction=<optimized out>, dest=0x2ab629b4, execute_once=<optimized out>) at
execMain.c:1712
> #22 standard_ExecutorRun (queryDesc=0x218a1490, direction=ForwardScanDirection, count=0, execute_once=<optimized
out>)at execMain.c:353
 
> #23 0x00859445 in ExecutorRun (queryDesc=0x218a1490, direction=ForwardScanDirection, count=0, execute_once=<optimized
out>)at execMain.c:296
 
> #24 0x009a3d57 in PortalRunSelect (portal=portal@entry=0x2197f090, forward=<optimized out>, count=0, dest=0x2ab629b4)
atpquery.c:941
 
> #25 0x009a39d6 in PortalRun (portal=0x2197f090, count=2147483647, isTopLevel=<optimized out>, run_once=<optimized
out>,dest=0x2ab629b4, altdest=0x2ab629b4,
 
>     completionTag=0xffbfdc70 "") at pquery.c:782
> #26 0x009a2b53 in exec_simple_query (query_string=query_string@entry=0x21873090 "select count(*) from simple r join
simples using (id);") at postgres.c:1145
 
> #27 0x009a04ec in PostgresMain (argc=1, argv=0x21954ed4, dbname=<optimized out>, username=0x21954d38 "buildfarm") at
postgres.c:4052
>
> Worker 1 looks like this:
>
> (gdb) bt
> #0  _poll () at _poll.S:4
> #1  0x21701361 in __thr_poll (fds=0x2187cb48, nfds=2, timeout=-1) at /usr/src/lib/libthr/thread/thr_syscalls.c:338
> #2  0x215eaf3f in poll (pfd=0x2187cb48, nfds=2, timeout=-1) at /usr/src/lib/libc/sys/poll.c:47
> #3  0x0097b0fd in WaitEventSetWaitBlock (set=<optimized out>, cur_timeout=-1, occurred_events=<optimized out>,
nevents=<optimizedout>) at latch.c:1171
 
> #4  WaitEventSetWait (set=0x2187cb10, timeout=-1, occurred_events=<optimized out>, nevents=1,
wait_event_info=134217738)at latch.c:1000
 
> #5  0x0099842b in ConditionVariableSleep (cv=0x2196b230, wait_event_info=134217738) at condition_variable.c:157
> #6  0x00977e12 in BarrierArriveAndWait (barrier=0x2196b218, wait_event_info=134217738) at barrier.c:191
> #7  0x00873441 in ExecHashJoinImpl (pstate=<optimized out>, parallel=true) at nodeHashjoin.c:328
> #8  ExecParallelHashJoin (pstate=0x2a8378d8) at nodeHashjoin.c:600
> #9  0x0086a509 in ExecProcNode (node=0x2a8378d8) at ../../../src/include/executor/executor.h:248
> #10 fetch_input_tuple (aggstate=aggstate@entry=0x2a8376c8) at nodeAgg.c:407
> #11 0x00869646 in agg_retrieve_direct (aggstate=<optimized out>) at nodeAgg.c:1746
> #12 ExecAgg (pstate=0x2a8376c8) at nodeAgg.c:1561
> #13 0x0085956b in ExecProcNode (node=0x2a8376c8) at ../../../src/include/executor/executor.h:248
> #14 ExecutePlan (estate=0x2a837090, planstate=0x2a8376c8, use_parallel_mode=<optimized out>, operation=CMD_SELECT,
sendTuples=<optimizedout>,
 
>     numberTuples=<optimized out>, direction=<optimized out>, dest=0x219ac780, execute_once=<optimized out>) at
execMain.c:1712
> #15 standard_ExecutorRun (queryDesc=0x219ad3e0, direction=ForwardScanDirection, count=0, execute_once=<optimized
out>)at execMain.c:353
 
> #16 0x00859445 in ExecutorRun (queryDesc=0x219ad3e0, direction=ForwardScanDirection, count=0, execute_once=<optimized
out>)at execMain.c:296
 
> #17 0x0085d6de in ParallelQueryMain (seg=0x2187c89c, toc=0x2196b000) at execParallel.c:1402
> #18 0x00750ee0 in ParallelWorkerMain (main_arg=1903254317) at parallel.c:1433
>
> Worker 2 looks like this:
>
> (gdb) bt
> #0  _poll () at _poll.S:4
> #1  0x21701361 in __thr_poll (fds=0x2187cb48, nfds=2, timeout=-1) at /usr/src/lib/libthr/thread/thr_syscalls.c:338
> #2  0x215eaf3f in poll (pfd=0x2187cb48, nfds=2, timeout=-1) at /usr/src/lib/libc/sys/poll.c:47
> #3  0x0097b0fd in WaitEventSetWaitBlock (set=<optimized out>, cur_timeout=-1, occurred_events=<optimized out>,
nevents=<optimizedout>) at latch.c:1171
 
> #4  WaitEventSetWait (set=0x2187cb10, timeout=-1, occurred_events=<optimized out>, nevents=1,
wait_event_info=134217737)at latch.c:1000
 
> #5  0x0099842b in ConditionVariableSleep (cv=0x2196b230, wait_event_info=134217737) at condition_variable.c:157
> #6  0x00977e12 in BarrierArriveAndWait (barrier=0x2196b218, wait_event_info=134217737) at barrier.c:191
> #7  0x0086ea44 in MultiExecParallelHash (node=<optimized out>) at nodeHash.c:312
> #8  MultiExecHash (node=0x2a837e88) at nodeHash.c:112
> #9  0x0085fa03 in MultiExecProcNode (node=0x2a837e88) at execProcnode.c:501
> #10 0x00872f80 in ExecHashJoinImpl (pstate=<optimized out>, parallel=true) at nodeHashjoin.c:290
> #11 ExecParallelHashJoin (pstate=0x2a8378d8) at nodeHashjoin.c:600
> #12 0x0086a509 in ExecProcNode (node=0x2a8378d8) at ../../../src/include/executor/executor.h:248
> #13 fetch_input_tuple (aggstate=aggstate@entry=0x2a8376c8) at nodeAgg.c:407
> #14 0x00869646 in agg_retrieve_direct (aggstate=<optimized out>) at nodeAgg.c:1746
> #15 ExecAgg (pstate=0x2a8376c8) at nodeAgg.c:1561
> #16 0x0085956b in ExecProcNode (node=0x2a8376c8) at ../../../src/include/executor/executor.h:248
> #17 ExecutePlan (estate=0x2a837090, planstate=0x2a8376c8, use_parallel_mode=<optimized out>, operation=CMD_SELECT,
sendTuples=<optimizedout>,
 
>     numberTuples=<optimized out>, direction=<optimized out>, dest=0x219ac780, execute_once=<optimized out>) at
execMain.c:1712
> #18 standard_ExecutorRun (queryDesc=0x219ad3e0, direction=ForwardScanDirection, count=0, execute_once=<optimized
out>)at execMain.c:353
 
> #19 0x00859445 in ExecutorRun (queryDesc=0x219ad3e0, direction=ForwardScanDirection, count=0, execute_once=<optimized
out>)at execMain.c:296
 
> #20 0x0085d6de in ParallelQueryMain (seg=0x2187c89c, toc=0x2196b000) at execParallel.c:1402
> #21 0x00750ee0 in ParallelWorkerMain (main_arg=1903254317) at parallel.c:1433
> #22 0x009193d3 in StartBackgroundWorker () at bgworker.c:911
>
> In each case I attached to the process, ran the gdb commands shown
> above, and detached.  As I've grown used to seeing, the processes
> resumed running after I detached from the second worker.  I've not
> tried attaching in something other than PID order, but I probably
> will next time.
>
> Thoughts?  Ideas on debugging this?

How could it be that worker 2 is waiting on the build barrier in
PHJ_BUILD_HASHING_INNER and worker 1 and the leader are waiting on it
with it supposedly in PHJ_BUILD_HASHING_OUTER?



Re: Missed condition-variable wakeups on FreeBSD

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2022-02-26 14:07:05 -0500, Tom Lane wrote:
>> I have observed this three times in the REL_11 branch, once
>> in REL_12, and a couple of times last summer before it occurred
>> to me to start keeping notes.  Over that time the machine has
>> been running various patchlevels of FreeBSD 13.0.

> It's certainly interesting that it appears to happen only in the branches
> using poll rather than kqueue to implement latches. That changed between 12
> and 13.

Yeah, and there was no PHJ in v10, so that's a pretty good theory as
to why I've only seen it in those two branches.

> Have you tried running the core regression tests with force_parallel_mode =
> on, or with the parallel costs lowered, to see if that makes the problem
> appear more often?
> The next time this happens / if you still have this open, perhaps it could be
> worth checking if there's a byte in the self pipe?
> Besides trying to make the issue more likely as suggested above, it might be
> worth checking if signalling the stuck processes with SIGUSR1 gets them
> unstuck.

I've now wasted a bunch of kilowatt-hours fruitlessly trying to
reproduce this outside the confines of the buildfarm script.
I'm at a loss to figure out what the buildfarm is doing differently,
but apparently there's something.  I'm going to re-enable the
machine's buildfarm job and just wait for it to hang up again.
More info eventually ...

            regards, tom lane



Re: Missed condition-variable wakeups on FreeBSD

From
Thomas Munro
Date:
On Sun, Feb 27, 2022 at 8:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I don't know much about how gdb interacts with kernel calls on
> FreeBSD, but I speculate that the poll(2) call returns with EINTR
> after gdb releases the process, and then things resume fine,

Yeah, at least FreeBSD and macOS interrupt system calls when you send
SIGSTOP + SIGCONT directly, or when gdb and lldb do something similar
via ptrace.  The same applies on Linux, except that Linux restarts the
system call automatically (even though the man page has this in the
list of system calls that never restart) so you don't usually notice
(though you can see it with strace).  It's not really clear to me what
should happen given the language around restarts is all about signal
handlers, and stop + cont are system magic, not signal handlers.
Anyway...

> suggesting that we lost an interrupt somewhere.

So it's happening on an i386 kernel, with WAIT_USE_POLL (not
WAIT_USE_KQUEUE), but only under the build farm script (not when you
ran it manually in loop)....  hmmm.



Re: Missed condition-variable wakeups on FreeBSD

From
Thomas Munro
Date:
On Sun, Feb 27, 2022 at 11:18 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> How could it be that worker 2 is waiting on the build barrier in
> PHJ_BUILD_HASHING_INNER and worker 1 and the leader are waiting on it
> with it supposedly in PHJ_BUILD_HASHING_OUTER?

That'd be consistent with a wakeup going missing, somehow.  W2 was
supposed to be released but somehow didn't get the message (until a
random debugger-induced EINTR releases it and it sees all the state it
needs to proceed in shared memory), meanwhile the other two got to the
end of the next phase and are waiting for it.



Re: Missed condition-variable wakeups on FreeBSD

From
Thomas Munro
Date:
On Sun, Feb 27, 2022 at 9:44 AM Andres Freund <andres@anarazel.de> wrote:
> > (gdb) p debug_query_string
> > $1 = 0x21873090 "select count(*) from simple r join simple s using (id);"
> > (gdb) bt
> > #0  _poll () at _poll.S:4
> > #1  0x21701361 in __thr_poll (fds=0x219dc170, nfds=2, timeout=-1) at /usr/src/lib/libthr/thread/thr_syscalls.c:338
> > #2  0x215eaf3f in poll (pfd=0x219dc170, nfds=2, timeout=-1) at /usr/src/lib/libc/sys/poll.c:47
> > #3  0x0097b0fd in WaitEventSetWaitBlock (set=<optimized out>, cur_timeout=-1, occurred_events=<optimized out>,
nevents=<optimizedout>) at latch.c:1171
 
>
> The next time this happens / if you still have this open, perhaps it could be
> worth checking if there's a byte in the self pipe?

As a basic sanity check I'd like to see pfd[0], pfd[1] and the output
of fstat -p PID to see unconsumed bytes in the pipe.  For example
"echo hello | sleep 60" shows "0* pipe ... 6 rw" for sleep, but "echo
hello | more" shows "0* pipe ... 0 rw".



Re: Missed condition-variable wakeups on FreeBSD

From
Thomas Munro
Date:
On Sun, Feb 27, 2022 at 8:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I have observed this three times in the REL_11 branch, once
> in REL_12, and a couple of times last summer before it occurred
> to me to start keeping notes.  Over that time the machine has
> been running various patchlevels of FreeBSD 13.0.

FTR I noticed that my animal elver hadn't reported for a while and
logged and found it locked up, a bit like that, but in test_shm_mq
rather than PHJ.  It's FreeBSD 13.1 on a fairly fast 16 way amd64 (in
a jail), and it was testing REL_12_STABLE, so using poll() for
WaitEventSetWaitBlock().  When I connected a debugger then
disconnected, it was unblocked and continued.  Unfortunately I didn't
get any new information :-(