On 2025-Jul-17, Andrey Borodin wrote:
> Thinking more about the problem I see 3 ways to deal with this deadlock:
> 1. We check for recovery conflict even in presence of
> InterruptHoldoffCount. That's what patch v4 does.
> 2. Teach page_collect_tuples() to do HeapTupleSatisfiesVisibility()
> without holding buffer lock.
> 3. Why do we even HOLD_INTERRUPTS() when aquire shared lock??
Hmm, as you say, doing (3) is a very invasive system-wide change, but
can we do it more localized? I mean, what if we do RESUME_INTERRUPTS()
just before going to sleep on the CV, and restore with HOLD_INTERRUPTS()
once the sleep is done? That would only affect this one place rather
than the whole system, and should also (AFAICS) solve the issue.
> Yet, I see 3 as a correct solution. Can't we just abstain from
> HOLD_INTERRUPTS() if taken LWLock is not exclusive?
Hmm, the code in LWLockAcquire says
/*
* Lock out cancel/die interrupts until we exit the code section protected
* by the LWLock. This ensures that interrupts will not interfere with
* manipulations of data structures in shared memory.
*/
HOLD_INTERRUPTS();
which means if we want to change this, we would have to inspect every
single use of LWLocks in shared mode in order to be certain that such a
change isn't problematic. This is a discussion I'm not prepared for.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Si quieres ser creativo, aprende el arte de perder el tiempo"