On Wed, Nov 13, 2024 at 2:05 PM Chris Cleveland
<ccleveland@dieselpoint.com> wrote:
> In my extension I got a mystery error:
>
> TRAP: failed Assert("InterruptHoldoffCount > 0"), File: "lwlock.c", Line: 1869, PID: 62663
> 0 postgres 0x000000010135adb4 ExceptionalCondition + 108
> 1 postgres 0x00000001012235ec LWLockRelease + 1456
> 2 postgres 0x00000001011faebc UnlockReleaseBuffer + 24
>
> Turns out there was a bug in my extension where I was getting a share lock on a particular index page over and over.
Oddly,the error showed up not when I was getting the locks, but when I released them. Any time I locked the index page
morethan ~200 times, this error would show up on release.
I wonder how you managed to avoid hitting this check in LWLockAcquire:
/* Ensure we will have room to remember the lock */
if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
elog(ERROR, "too many LWLocks taken");
> 1. Why is the limit on the number of locks so low? I thought that when getting a share lock, all it did was bump a
referencecount.
200 LWLocks is an ENORMOUS number of LWLocks to be holding at once.
Except in very specific circumstances such as the one mentioned in the
comment for MAX_SIMUL_LWLOCKS, holding more than 1 or 2 or MAYBE 3
LWLocks simultaneously is a recipe for disaster. One issue is that
there's no deadlock checking, and it's easy to bring down an entire
system. Another issue is that other code will be expecting you to
release the lock quickly and you may cause the entire system to pile
up behind whichever lock you're holding. Details aside, you're only
supposed to hold an LWLock while you're actively looking at the
in-memory data structure it protects. If you need to keep a buffer
around for a longer time, you can hold a buffer pin for a longer time,
but the time for which you actually hold the lock needs to be minimal.
--
Robert Haas
EDB: http://www.enterprisedb.com