Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Date
Msg-id CA+hUKG+YkAnOLrKKcy-FLjoVUV3r=L+c28gzMSL58Cv9jC4nvg@mail.gmail.com
Whole thread Raw
In response to Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
List pgsql-hackers
After 1000 make check loops, and 1000 make -C src/test/modules/test_shm_mq
check loops, on the same FBSD 13.1 machine as elver which has failed
like this once before, I haven't been able to reproduce this on
REL_12_STABLE.  Not really sure how to chase this, but if you see this
situation again, I'd been interested to see the output of fstat -p PID
(shows bytes in pipes) and procstat -j PID (shows pending signals) for
all PIDs involved (before connecting a debugger or doing anything else
that might make it return with EINTR, after which we know it continues
happily because it then sees latch->is_set next time around the loop).
If poll() is not returning when there are bytes ready to read from the
self-pipe, which fstat can show, I think that'd indicate a kernel bug.
If procstat -j shows signals pending but somehow it's still blocked in
the syscall.  Otherwise, it might indicate a compiler or postgres bug,
but I don't have any particular theories.



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Next
From: Bharath Rupireddy
Date:
Subject: Re: Improve WALRead() to suck data directly from WAL buffers when possible