Re: [patch] demote - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: [patch] demote
Date
Msg-id 20200626.161438.1461210156318414119.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: [patch] demote  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
Responses Re: [patch] demote
Re: [patch] demote
List pgsql-hackers
Hello.

At Thu, 25 Jun 2020 19:27:54 +0200, Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote in 
> Here is a summary of my work during the last few days on this demote approach.
> 
> Please, find in attachment v2-0001-Demote-PoC.patch and the comments in the
> commit message and as FIXME in code.
> 
> The patch is not finished or bug-free yet, I'm still not very happy with the
> coding style, it probably lack some more code documentation, but a lot has
> changed since v1. It's still a PoC to push the discussion a bit further after
> being myself silent for some days.
> 
> The patch is currently relying on a demote checkpoint. I understand a forced
> checkpoint overhead can be massive and cause major wait/downtime. But I keep
> this for a later step. Maybe we should be able to cancel a running checkpoint?
> Or leave it to its synching work but discard the result without wirting it to
> XLog?

If we are going to dive so close to server shutdown, we can just
utilize the restart-after-crash path, which we can assume to work
reliably. The attached is a quite rough sketch, hijacking smart
shutdown path for a convenience, of that but seems working.  "pg_ctl
-m s -W stop" lets server demote.

> I hadn't time to investigate Robert's concern about shared memory for snapshot
> during recovery.

The patch does all required clenaup of resources including shared
memory, I believe.  It's enough if we don't need to keep any resources
alive?

> The patch doesn't deal with prepared xact yet. Testing "start->demote->promote"
> raise an assert if some prepared xact exist. I suppose I will rollback them
> during demote in next patch version.
> 
> I'm not sure how to divide this patch in multiple small independent steps. I
> suppose I can split it like:
> 
> 1. add demote checkpoint
> 2. support demote: mostly postmaster, startup/xlog and checkpointer related
>    code
> 3. cli using pg_ctl demote
> 
> ...But I'm not sure it worth it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b4d475bb0b..a4adf3e587 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2752,6 +2752,7 @@ SIGHUP_handler(SIGNAL_ARGS)
 /*
  * pmdie -- signal handler for processing various postmaster signals.
  */
+static bool        demoting = false;
 static void
 pmdie(SIGNAL_ARGS)
 {
@@ -2774,59 +2775,17 @@ pmdie(SIGNAL_ARGS)
         case SIGTERM:
 
             /*
-             * Smart Shutdown:
+             * XXX: Hijacked as DEMOTE
              *
-             * Wait for children to end their work, then shut down.
+             * Runs fast shutdown, then restart as standby
              */
             if (Shutdown >= SmartShutdown)
                 break;
             Shutdown = SmartShutdown;
             ereport(LOG,
-                    (errmsg("received smart shutdown request")));
-
-            /* Report status */
-            AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STOPPING);
-#ifdef USE_SYSTEMD
-            sd_notify(0, "STOPPING=1");
-#endif
-
-            if (pmState == PM_RUN || pmState == PM_RECOVERY ||
-                pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
-            {
-                /* autovac workers are told to shut down immediately */
-                /* and bgworkers too; does this need tweaking? */
-                SignalSomeChildren(SIGTERM,
-                                   BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER);
-                /* and the autovac launcher too */
-                if (AutoVacPID != 0)
-                    signal_child(AutoVacPID, SIGTERM);
-                /* and the bgwriter too */
-                if (BgWriterPID != 0)
-                    signal_child(BgWriterPID, SIGTERM);
-                /* and the walwriter too */
-                if (WalWriterPID != 0)
-                    signal_child(WalWriterPID, SIGTERM);
-
-                /*
-                 * If we're in recovery, we can't kill the startup process
-                 * right away, because at present doing so does not release
-                 * its locks.  We might want to change this in a future
-                 * release.  For the time being, the PM_WAIT_READONLY state
-                 * indicates that we're waiting for the regular (read only)
-                 * backends to die off; once they do, we'll kill the startup
-                 * and walreceiver processes.
-                 */
-                pmState = (pmState == PM_RUN) ?
-                    PM_WAIT_BACKUP : PM_WAIT_READONLY;
-            }
-
-            /*
-             * Now wait for online backup mode to end and backends to exit. If
-             * that is already the case, PostmasterStateMachine will take the
-             * next step.
-             */
-            PostmasterStateMachine();
-            break;
+                    (errmsg("received demote request")));
+            demoting = true;
+            /* FALL THROUGH */
 
         case SIGINT:
 
@@ -2839,8 +2798,10 @@ pmdie(SIGNAL_ARGS)
             if (Shutdown >= FastShutdown)
                 break;
             Shutdown = FastShutdown;
-            ereport(LOG,
-                    (errmsg("received fast shutdown request")));
+
+            if (!demoting)
+                ereport(LOG,
+                        (errmsg("received fast shutdown request")));
 
             /* Report status */
             AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STOPPING);
@@ -2887,6 +2848,13 @@ pmdie(SIGNAL_ARGS)
                 pmState = PM_WAIT_BACKENDS;
             }
 
+            /* create standby signal file */
+            {
+                FILE *standby_file = AllocateFile(STANDBY_SIGNAL_FILE, "w");
+
+                Assert (standby_file && !FreeFile(standby_file));
+            }
+
             /*
              * Now wait for backends to exit.  If there are none,
              * PostmasterStateMachine will take the next step.
@@ -3958,7 +3926,7 @@ PostmasterStateMachine(void)
      * EOF on its input pipe, which happens when there are no more upstream
      * processes.
      */
-    if (Shutdown > NoShutdown && pmState == PM_NO_CHILDREN)
+    if (!demoting && Shutdown > NoShutdown && pmState == PM_NO_CHILDREN)
     {
         if (FatalError)
         {
@@ -3996,13 +3964,23 @@ PostmasterStateMachine(void)
         ExitPostmaster(1);
 
     /*
-     * If we need to recover from a crash, wait for all non-syslogger children
-     * to exit, then reset shmem and StartupDataBase.
+     * If we need to recover from a crash or demoting, wait for all
+     * non-syslogger children to exit, then reset shmem and StartupDataBase.
      */
-    if (FatalError && pmState == PM_NO_CHILDREN)
+    if ((demoting || FatalError) && pmState == PM_NO_CHILDREN)
     {
-        ereport(LOG,
-                (errmsg("all server processes terminated; reinitializing")));
+        if (demoting)
+            ereport(LOG,
+                    (errmsg("all server processes terminated; starting as standby")));
+        else
+            ereport(LOG,
+                    (errmsg("all server processes terminated; reinitializing")));
+
+        if (demoting)
+        {
+            Shutdown = NoShutdown;
+            demoting = false;
+        }
 
         /* allow background workers to immediately restart */
         ResetBackgroundWorkerCrashTimes();

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Why forbid "INSERT INTO t () VALUES ();"
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: [patch] demote