Hi,
Here's a reproducer which enabled me to reach this stuck state:
pid | wait_event | query
-------+---------------+-----------------------------------------------------------------------------
64617 | | select pid, wait_event, query from
pg_stat_activity where state = 'active';
64619 | BufferPin | VACUUM jobs
64620 | ExecuteGather | SELECT COUNT(*) FROM jobs
64621 | ExecuteGather | SELECT COUNT(*) FROM jobs
64622 | ExecuteGather | SELECT COUNT(*) FROM jobs
64623 | ExecuteGather | SELECT COUNT(*) FROM jobs
84167 | BtreePage | SELECT COUNT(*) FROM jobs
84168 | BtreePage | SELECT COUNT(*) FROM jobs
96440 | | SELECT COUNT(*) FROM jobs
96438 | | SELECT COUNT(*) FROM jobs
96439 | | SELECT COUNT(*) FROM jobs
(11 rows)
The main thread deletes stuff in the middle of the key range (not sure
if this is important) and vacuum in a loop, and meanwhile 4 threads
(probably not important, might as well be 1) run Parallel Index Scans
over the whole range, in the hope of hitting the interesting case. In
the locked-up case I just saw now opaque->btpo_flags had the
BTP_DELETED bit set, not BTP_HALF_DEAD (I could tell because I added
logging). Clearly pages are periodically being marked half-dead but I
haven't yet managed to get an index scan to hit one of those.
--
Thomas Munro
http://www.enterprisedb.com