Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Date
Msg-id CA+hUKGLL=v=f+Fv=cx=qieCyXbdC7DLgyV=+VdKSLJhOPu5nhA@mail.gmail.com
Whole thread Raw
In response to Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)  (Alexander Lakhin <exclusion@gmail.com>)
List pgsql-hackers
I agree that the code lacks barriers.  I haven't been able to figure
out how any reordering could cause this hang, though, because in these
old branches procsignal_sigusr1_handler is used for latch wakeups, and
it also calls SetLatch(MyLatch) itself, right at the end.  That is,
SetLatch() gets called twice, first in the waker process and then
again in the awoken process, so it should be impossible for the latter
not to see MyLatch->is_set == true after procsignal_sigusr1_handler
completes.

That made me think the handler didn't run, which is consistent with
procstat -i showing it as pending ('P').  Which made me start to
suspect a kernel bug, unless we can explain what we did to block it...

But... perhaps I am confused about that and did something wrong when
looking into it.  It's hard to investigate when you aren't allowed to
take core files or connect a debugger (both will reliably trigger
EINTR).



pgsql-hackers by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Next
From: Tatsuo Ishii
Date:
Subject: Re: Row pattern recognition