Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting inparallel query - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting inparallel query
Date
Msg-id CAEepm=3ynb5nBhKQRts0bNETA1HzNxz6-3RTPOzCbM8oQ9yPdg@mail.gmail.com
Whole thread Raw
In response to Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting in parallel query  (Sergei Kornilov <sk@zsrv.org>)
Responses Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting in parallel query  (Sergei Kornilov <sk@zsrv.org>)
List pgsql-bugs
On Thu, Jan 24, 2019 at 11:56 PM Sergei Kornilov <sk@zsrv.org> wrote:
> We should not call dsm_backend_shutdown twice in same process, right? So we tried call dsm_detach on same segment
0x5624578710c8twice, but this is unexpected behavior and refcnt would be incorrect. And seems we can not LWLockAcquire
lockand then LWLockAcquire same lock again without release. And here we have infinite waiting. 

Yeah, I think your analysis is right.  It shouldn't do so while
holding the lock.  dsm_unpin_segment() should perhaps release it
before it raises an error, something like:

diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index 36904d2676..b989c0b94a 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -924,9 +924,15 @@ dsm_unpin_segment(dsm_handle handle)
         * called on a segment which is pinned.
         */
        if (control_slot == INVALID_CONTROL_SLOT)
+       {
+               LWLockRelease(DynamicSharedMemoryControlLock);
                elog(ERROR, "cannot unpin unknown segment handle");
+       }
        if (!dsm_control->item[control_slot].pinned)
+       {
+               LWLockRelease(DynamicSharedMemoryControlLock);
                elog(ERROR, "cannot unpin a segment that is not pinned");
+       }
        Assert(dsm_control->item[control_slot].refcnt > 1);

        /*

I have contemplated that before, but not done it because I'm not sure
about the state of the system after that; we just shouldn't be in this
situation, because if we are, it means that we can error out when
later segments (in the array dsa_release_in_place() loops through)
remain pinned forever and we'll leak memory and run out of DSM slots.
Segment pinning is opting out of resource owner control, which means
the client code is responsible for not screwing it up.  Perhaps that
suggests we should PANIC, or perhaps just LOG and continue, but I'm
not sure.

I think the root cause is earlier and in a different process (see
ProcessInterrupt() in the stack).  Presumably one that reported
"dsa_area could not attach to segment" is closer to the point where
things go wrong.  If you are in a position to reproduce this on a
modified source tree, it'd be good to see the back trace for that, to
figure out which of a couple of possible code paths reach it.  Perhaps
you could do that by enabling core files and changing this:

-                       elog(ERROR, "dsa_area could not attach to segment");
+                       elog(PANIC, "dsa_area could not attach to segment");

I have so far not succeeded in reaching that condition.

--
Thomas Munro
http://www.enterprisedb.com


pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Suggestion: include interruption method for \watch option (page 1922, PostgreSQL 11.1 Documentation)
Next
From: Patrick Headley
Date:
Subject: Re: How duplicate values inserted into the primary key column oftable and how to fix it