[HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
Date
Msg-id 20170201.173623.66249355.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
Responses Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
List pgsql-hackers
Hello, while looking another bug, I found that standby cannot
shutdown after DROP SUBSCRIPTION.

standby=# CREATE SUBSCRPTION sub1 ...
standby=# ....
standby=# DROP SUBSCRIPTION sub1;

Ctrl-C to the standby fails to work. ApplyLauncherMain is waiting
LogicalRepLauncherLock forever.

The culprit is DropSbuscription. It acquires
LogicalRepLauncherLock but never releases.

The attached patch fixes it. Most part of the fucntion is now
enclosed by PG_TRY-CATCH since some functions can throw
exceptions.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 5de9999..f07143e 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -511,51 +511,67 @@ DropSubscription(DropSubscriptionStmt *stmt)    /* Protect against launcher restarting the
worker.*/    LWLockAcquire(LogicalRepLauncherLock, LW_EXCLUSIVE);
 
-    /* Kill the apply worker so that the slot becomes accessible. */
-    logicalrep_worker_stop(subid);
-
-    /* Remove the origin tracking if exists. */
-    snprintf(originname, sizeof(originname), "pg_%u", subid);
-    originid = replorigin_by_name(originname, true);
-    if (originid != InvalidRepOriginId)
-        replorigin_drop(originid);
-
-    /* If the user asked to not drop the slot, we are done mow.*/
-    if (!stmt->drop_slot)
-    {
-        heap_close(rel, NoLock);
-        return;
-    }
-    /*
-     * Otherwise drop the replication slot at the publisher node using
-     * the replication connection.
+     * replorigin_drop can throw an exception. we must release
+     * LogicalRepLauncherLock for the case.     */
-    load_file("libpqwalreceiver", false);
+    PG_TRY();
+    {
+        /* Kill the apply worker so that the slot becomes accessible. */
+        logicalrep_worker_stop(subid);
-    initStringInfo(&cmd);
-    appendStringInfo(&cmd, "DROP_REPLICATION_SLOT \"%s\"", slotname);
+        /* Remove the origin tracking if exists. */
+        snprintf(originname, sizeof(originname), "pg_%u", subid);
+        originid = replorigin_by_name(originname, true);
+        if (originid != InvalidRepOriginId)
+            replorigin_drop(originid);
-    wrconn = walrcv_connect(conninfo, true, subname, &err);
-    if (wrconn == NULL)
-        ereport(ERROR,
-                (errmsg("could not connect to publisher when attempting to "
-                        "drop the replication slot \"%s\"", slotname),
-                 errdetail("The error was: %s", err)));
+        /* If the user asked to not drop the slot, we are done now.*/
+        if (stmt->drop_slot)
+        {
+            /*
+             * Otherwise drop the replication slot at the publisher node using
+             * the replication connection.
+             */
+            load_file("libpqwalreceiver", false);
+
+            initStringInfo(&cmd);
+            appendStringInfo(&cmd, "DROP_REPLICATION_SLOT \"%s\"", slotname);
+
+            wrconn = walrcv_connect(conninfo, true, subname, &err);
+            if (wrconn == NULL || !walrcv_command(wrconn, cmd.data, &err))
+            {
+                if (wrconn == NULL)
+                    ereport(ERROR,
+                            (errmsg("could not connect to publisher when "
+                                    "attempting to drop the "
+                                    "replication slot \"%s\"", slotname),
+                             errdetail("The error was: %s", err)));
-    if (!walrcv_command(wrconn, cmd.data, &err))
-        ereport(ERROR,
-                (errmsg("could not drop the replication slot \"%s\" on publisher",
-                        slotname),
-                 errdetail("The error was: %s", err)));
-    else
-        ereport(NOTICE,
-                (errmsg("dropped replication slot \"%s\" on publisher",
-                        slotname)));
+                ereport(ERROR,
+                        (errmsg("could not drop the replication slot \"%s\" on publisher",
+                                slotname),
+                         errdetail("The error was: %s", err)));
+            }
-    walrcv_disconnect(wrconn);
+            ereport(NOTICE,
+                    (errmsg("dropped replication slot \"%s\" on publisher",
+                            slotname)));
-    pfree(cmd.data);
+            pfree(cmd.data);
+        }
+    }
+    PG_CATCH();
+    {
+        LWLockRelease(LogicalRepLauncherLock);
+        PG_RE_THROW();
+    }
+    PG_END_TRY();
+
+    LWLockRelease(LogicalRepLauncherLock);
+
+    if (wrconn)
+        walrcv_disconnect(wrconn);    heap_close(rel, NoLock);}

pgsql-hackers by date:

Previous
From: Nikhil Sontakke
Date:
Subject: Re: [HACKERS] Speedup twophase transactions
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: [HACKERS] IF (NOT) EXISTS in psql-completion