On 11/13/24 20:05, Chris Cleveland wrote:
> In my extension I got a mystery error:
>
> TRAP: failed Assert("InterruptHoldoffCount > 0"), File: "lwlock.c",
> Line: 1869, PID: 62663
> 0postgres 0x000000010135adb4ExceptionalCondition + 108
> 1postgres 0x00000001012235ecLWLockRelease + 1456
> 2postgres 0x00000001011faebcUnlockReleaseBuffer + 24
>
> Turns out there was a bug in my extension where I was getting a share
> lock on a particular index page over and over. Oddly, the error showed
> up not when I was getting the locks, but when I released them. Any time
> I locked the index page more than ~200 times, this error would show up
> on release.
>
> Questions:
>
> 1. Why is the limit on the number of locks so low? I thought that when
> getting a share lock, all it did was bump a reference count.
>
Because good code shouldn't really need more than 200 LWLocks. Note this
limit does not apply to row locks, relation locks, and so on.
> 2. Is there a way to get this to fail gracefully, that is, with an error
> message that makes sense, and kicks in at the moment you go over the
> limit, instead of later?
>
Not really, the limit of 200 lwlocks is hard-coded, so the only solution
is to not acquire that many of them (in a single backend). But I wonder
if you're actually hitting that limit, because that should trigger
/* Ensure we will have room to remember the lock */
if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
elog(ERROR, "too many LWLocks taken");
and not the assert. That suggests your extension does something wrong
with HOLD_INTERRUPTS() or something like that.
regards
--
Tomas Vondra