Thread: Graceful way to handle too many locks

Graceful way to handle too many locks

From
Chris Cleveland
Date:
In my extension I got a mystery error:

TRAP: failed Assert("InterruptHoldoffCount > 0"), File: "lwlock.c", Line: 1869, PID: 62663
0 postgres 0x000000010135adb4 ExceptionalCondition + 108
1 postgres 0x00000001012235ec LWLockRelease + 1456
2 postgres 0x00000001011faebc UnlockReleaseBuffer + 24

Turns out there was a bug in my extension where I was getting a share lock on a particular index page over and over. Oddly, the error showed up not when I was getting the locks, but when I released them. Any time I locked the index page more than ~200 times, this error would show up on release. 

Questions:

1. Why is the limit on the number of locks so low? I thought that when getting a share lock, all it did was bump a reference count.

2. Is there a way to get this to fail gracefully, that is, with an error message that makes sense, and kicks in at the moment you go over the limit, instead of later?

--
Chris Cleveland
312-339-2677 mobile

Re: Graceful way to handle too many locks

From
Tomas Vondra
Date:

On 11/13/24 20:05, Chris Cleveland wrote:
> In my extension I got a mystery error:
> 
> TRAP: failed Assert("InterruptHoldoffCount > 0"), File: "lwlock.c",
> Line: 1869, PID: 62663
> 0postgres 0x000000010135adb4ExceptionalCondition + 108
> 1postgres 0x00000001012235ecLWLockRelease + 1456
> 2postgres 0x00000001011faebcUnlockReleaseBuffer + 24
> 
> Turns out there was a bug in my extension where I was getting a share
> lock on a particular index page over and over. Oddly, the error showed
> up not when I was getting the locks, but when I released them. Any time
> I locked the index page more than ~200 times, this error would show up
> on release. 
> 
> Questions:
> 
> 1. Why is the limit on the number of locks so low? I thought that when
> getting a share lock, all it did was bump a reference count.
> 

Because good code shouldn't really need more than 200 LWLocks. Note this
limit does not apply to row locks, relation locks, and so on.

> 2. Is there a way to get this to fail gracefully, that is, with an error
> message that makes sense, and kicks in at the moment you go over the
> limit, instead of later?
> 

Not really, the limit of 200 lwlocks is hard-coded, so the only solution
is to not acquire that many of them (in a single backend). But I wonder
if you're actually hitting that limit, because that should trigger

    /* Ensure we will have room to remember the lock */
    if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
        elog(ERROR, "too many LWLocks taken");

and not the assert. That suggests your extension does something wrong
with HOLD_INTERRUPTS() or something like that.


regards

-- 
Tomas Vondra




Re: Graceful way to handle too many locks

From
Robert Haas
Date:
On Wed, Nov 13, 2024 at 2:05 PM Chris Cleveland
<ccleveland@dieselpoint.com> wrote:
> In my extension I got a mystery error:
>
> TRAP: failed Assert("InterruptHoldoffCount > 0"), File: "lwlock.c", Line: 1869, PID: 62663
> 0 postgres 0x000000010135adb4 ExceptionalCondition + 108
> 1 postgres 0x00000001012235ec LWLockRelease + 1456
> 2 postgres 0x00000001011faebc UnlockReleaseBuffer + 24
>
> Turns out there was a bug in my extension where I was getting a share lock on a particular index page over and over.
Oddly,the error showed up not when I was getting the locks, but when I released them. Any time I locked the index page
morethan ~200 times, this error would show up on release. 

I wonder how you managed to avoid hitting this check in LWLockAcquire:

    /* Ensure we will have room to remember the lock */
    if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
        elog(ERROR, "too many LWLocks taken");

> 1. Why is the limit on the number of locks so low? I thought that when getting a share lock, all it did was bump a
referencecount. 

200 LWLocks is an ENORMOUS number of LWLocks to be holding at once.
Except in very specific circumstances such as the one mentioned in the
comment for MAX_SIMUL_LWLOCKS, holding more than 1 or 2 or MAYBE 3
LWLocks simultaneously is a recipe for disaster. One issue is that
there's no deadlock checking, and it's easy to bring down an entire
system. Another issue is that other code will be expecting you to
release the lock quickly and you may cause the entire system to pile
up behind whichever lock you're holding. Details aside, you're only
supposed to hold an LWLock while you're actively looking at the
in-memory data structure it protects. If you need to keep a buffer
around for a longer time, you can hold a buffer pin for a longer time,
but the time for which you actually hold the lock needs to be minimal.

--
Robert Haas
EDB: http://www.enterprisedb.com