Re: [PATCH] LWLock self-deadlock detection - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [PATCH] LWLock self-deadlock detection
Date
Msg-id CAExHW5sMC5heAJ_r_jBgwp48tgq-MPZg0tKLgzHO3yV6gt-jeQ@mail.gmail.com
Whole thread Raw
In response to [PATCH] LWLock self-deadlock detection  (Craig Ringer <craig.ringer@enterprisedb.com>)
Responses Re: [PATCH] LWLock self-deadlock detection  (Craig Ringer <craig.ringer@enterprisedb.com>)
List pgsql-hackers
This looks useful. LWLockCheckSelfDeadlock() could use LWLockHeldByMe
variant instead of copying that code with possibly a change in that
function to return the required information.

I am also seeing a pattern
Assert(LWLockHeldByMe*())
LWLockAcquire()

at some places. Should we change LWLockAcquire to do
Assert(LWLockHelpByMe()) always to detect such occurrences? Enhance
that pattern to print the information that your patch prints?

It looks weird that we can detect a self deadlock but not handle it.
But handling it requires remembering multiple LWLocks held and then
release them those many times. That may not leave LWLocks LW anymore.

On Thu, Nov 19, 2020 at 4:02 PM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:
>
> Hi all
>
> Here's a patch I wrote a while ago to detect and report when a LWLockAcquire() results in a simple self-deadlock due
tothe caller already holding the LWLock. 
>
> To avoid affecting hot-path performance, it only fires the check on the first iteration through the retry loops in
LWLockAcquire()and LWLockWaitForVar(), and just before we sleep, once the fast-path has been missed. 
>
> I wrote an earlier version of this when I was chasing down some hairy issues with background workers deadlocking on
someexit paths because ereport(ERROR) or elog(ERROR) calls fired when a LWLock was held would cause a before_shmem_exit
oron_shmem_exit cleanup function to deadlock when it tried to acquire the same lock. 
>
> But it's an easy enough mistake to make and a seriously annoying one to track down, so I figured I'd post it for
consideration.Maybe someone else will get some use out of it even if nobody likes the idea of merging it. 
>
> As written the check runs only for --enable-cassert builds or when LOCK_DEBUG is defined.



--
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Migration Oracle multitenant database to PostgreSQL ?
Next
From: Bruce Momjian
Date:
Subject: Re: Prevent printing "next step instructions" in initdb and pg_upgrade