Thread: Re: [COMMITTERS] pgsql: Start background writer during archive recovery.

Re: [COMMITTERS] pgsql: Start background writer during archive recovery.

From
Tom Lane
Date:
heikki@postgresql.org (Heikki Linnakangas) writes:
> Log Message:
> -----------
> Start background writer during archive recovery.

Might that have anything to do with this?

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=dungbeetle&dt=2009-02-18%2019:44:01
        regards, tom lane


Re: [COMMITTERS] pgsql: Start background writer during archive recovery.

From
Simon Riggs
Date:
On Wed, 2009-02-18 at 15:43 -0500, Tom Lane wrote:
> heikki@postgresql.org (Heikki Linnakangas) writes:
> > Log Message:
> > -----------
> > Start background writer during archive recovery.
> 
> Might that have anything to do with this?
> 
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=dungbeetle&dt=2009-02-18%2019:44:01

Hmmm, looks very probable. But not anything that jumps out quickly at
me. Will continue to check.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Re: [COMMITTERS] pgsql: Start background writer during archive recovery.

From
Simon Riggs
Date:
On Wed, 2009-02-18 at 21:28 +0000, Simon Riggs wrote:
> On Wed, 2009-02-18 at 15:43 -0500, Tom Lane wrote:
> > heikki@postgresql.org (Heikki Linnakangas) writes:
> > > Log Message:
> > > -----------
> > > Start background writer during archive recovery.
> > 
> > Might that have anything to do with this?
> > 
> > http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=dungbeetle&dt=2009-02-18%2019:44:01
> 
> Hmmm, looks very probable. But not anything that jumps out quickly at
> me. Will continue to check.

Finger points that way, but still can't see any specific reason for
that.

More likely to be an uncommon race condition, rather than a error
specific to dungbeetle. If startup process death is slow, this could
happen, though hasn't occurred in other tests. 

Given the shape of the patch, the likely fix is to bump
NUM_AUXILIARY_PROCS by one. 

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Simon Riggs <simon@2ndQuadrant.com> writes:
>>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=dungbeetle&dt=2009-02-18%2019:44:01

> More likely to be an uncommon race condition, rather than a error
> specific to dungbeetle.

I agree, that's what it looks like, especially since I couldn't
duplicate it on Fedora 9 x86_64 which is presumably fairly close
to what dungbeetle is running.

I tried to duplicate it by putting the box under extra load, but
no luck.
        regards, tom lane


Re: Re: [COMMITTERS] pgsql: Start background writer during archive recovery.

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> More likely to be an uncommon race condition, rather than a error
> specific to dungbeetle. If startup process death is slow, this could
> happen, though hasn't occurred in other tests. 

True, the startup process can live for a short while concurrently with 
bgwriter, walwriter and autovacuum launcher, before it exits.

> Given the shape of the patch, the likely fix is to bump
> NUM_AUXILIARY_PROCS by one. 

Not sure what you mean by the shape of the patch, but agreed.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Simon Riggs wrote:
>> More likely to be an uncommon race condition, rather than a error
>> specific to dungbeetle. If startup process death is slow, this could
>> happen, though hasn't occurred in other tests. 

> True, the startup process can live for a short while concurrently with 
> bgwriter, walwriter and autovacuum launcher, before it exits.

Maybe the postmaster should wait for startup process exit before
deciding to open for business, instead of just a signal.
        regards, tom lane


Re: Re: [COMMITTERS] pgsql: Start background writer during archive recovery.

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Simon Riggs wrote:
>>> More likely to be an uncommon race condition, rather than a error
>>> specific to dungbeetle. If startup process death is slow, this could
>>> happen, though hasn't occurred in other tests. 
> 
>> True, the startup process can live for a short while concurrently with 
>> bgwriter, walwriter and autovacuum launcher, before it exits.
> 
> Maybe the postmaster should wait for startup process exit before
> deciding to open for business, instead of just a signal.

Not a bad idea. Although, there's nothing wrong with the current way 
either. The startup process does a proc_exit(0) right after sending the 
signal ATM, so there's no real work left at that point.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> Maybe the postmaster should wait for startup process exit before
>> deciding to open for business, instead of just a signal.

> Not a bad idea. Although, there's nothing wrong with the current way 
> either. The startup process does a proc_exit(0) right after sending the 
> signal ATM, so there's no real work left at that point.

The thing wrong with it is assuming that nothing interesting will happen
during proc_exit().  We hang enough stuff on on_proc_exit hooks that
that seems like a pretty shaky assumption.
        regards, tom lane


Re: Re: [COMMITTERS] pgsql: Start background writer during archive recovery.

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Tom Lane wrote:
>>> Maybe the postmaster should wait for startup process exit before
>>> deciding to open for business, instead of just a signal.
>
>> Not a bad idea. Although, there's nothing wrong with the current way
>> either. The startup process does a proc_exit(0) right after sending the
>> signal ATM, so there's no real work left at that point.
>
> The thing wrong with it is assuming that nothing interesting will happen
> during proc_exit().  We hang enough stuff on on_proc_exit hooks that
> that seems like a pretty shaky assumption.

I can't get too worried, given that proc_exit() is a very well-beaten
code path. Admittedly not so much for an auxiliary process, but that's
just a dumbed down version of what happens with a full-blown backend.

However I started looking into that idea anyway, and figured that it
does simplify the logic in postmaster.c quite a bit, so I think it's
worth doing on those grounds alone. Attached is a patch against CVS HEAD
and also against a snapshot before the recovery infra patch, for easier
reading. I'll give that some more testing and commit if I find no issues.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 7695,7701 **** StartupProcessMain(void)
      BuildFlatFiles(false);

      /* Let postmaster know that startup is finished */
!     SendPostmasterSignal(PMSIGNAL_RECOVERY_COMPLETED);

      /* exit normally */
      proc_exit(0);
--- 7695,7701 ----
      BuildFlatFiles(false);

      /* Let postmaster know that startup is finished */
!     SetPostmasterSignal(PMSIGNAL_RECOVERY_COMPLETED);

      /* exit normally */
      proc_exit(0);
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***************
*** 227,240 **** static int    Shutdown = NoShutdown;
  static bool FatalError = false; /* T if recovering from backend crash */
  static bool RecoveryError = false; /* T if recovery failed */

- /* State of WAL redo */
- #define            NoRecovery            0
- #define            RecoveryStarted        1
- #define            RecoveryConsistent    2
- #define            RecoveryCompleted    3
-
- static int    RecoveryStatus = NoRecovery;
-
  /*
   * We use a simple state machine to control startup, shutdown, and
   * crash recovery (which is rather like shutdown followed by startup).
--- 227,232 ----
***************
*** 253,261 **** static int    RecoveryStatus = NoRecovery;
   * point, if we had the infrastructure to do that.
   *
   * When the WAL redo is finished, the startup process signals us the third
!  * time, and we switch to PM_RUN state. The startup process can also skip the
!  * recovery and consistent recovery phases altogether, as it will during
!  * normal startup when there's no recovery to be done, for example.
   *
   * Normal child backends can only be launched when we are in PM_RUN state.
   * (We also allow it in PM_WAIT_BACKUP state, but only for superusers.)
--- 245,256 ----
   * point, if we had the infrastructure to do that.
   *
   * When the WAL redo is finished, the startup process signals us the third
!  * time, and exits. We don't process the 3d signal immediately but when we
!  * see the that the startup process has exited, we check that we have
!  * received the signal. If everything is OK, we then switch to PM_RUN state.
!  * The startup process can also skip the recovery and consistent recovery
!  * phases altogether, as it will during normal startup when there's no
!  * recovery to be done, for example.
   *
   * Normal child backends can only be launched when we are in PM_RUN state.
   * (We also allow it in PM_WAIT_BACKUP state, but only for superusers.)
***************
*** 338,344 **** static void pmdie(SIGNAL_ARGS);
  static void reaper(SIGNAL_ARGS);
  static void sigusr1_handler(SIGNAL_ARGS);
  static void dummy_handler(SIGNAL_ARGS);
- static void CheckRecoverySignals(void);
  static void CleanupBackend(int pid, int exitstatus);
  static void HandleChildCrash(int pid, int exitstatus, const char *procname);
  static void LogChildExit(int lev, const char *procname,
--- 333,338 ----
***************
*** 2019,2025 **** pmdie(SIGNAL_ARGS)
              ereport(LOG,
                      (errmsg("received smart shutdown request")));

!             if (pmState == PM_RUN || pmState == PM_RECOVERY || pmState == PM_RECOVERY_CONSISTENT)
              {
                  /* autovacuum workers are told to shut down immediately */
                  SignalAutovacWorkers(SIGTERM);
--- 2013,2020 ----
              ereport(LOG,
                      (errmsg("received smart shutdown request")));

!             if (pmState == PM_RUN || pmState == PM_RECOVERY ||
!                 pmState == PM_RECOVERY_CONSISTENT)
              {
                  /* autovacuum workers are told to shut down immediately */
                  SignalAutovacWorkers(SIGTERM);
***************
*** 2159,2181 **** reaper(SIGNAL_ARGS)
           */
          if (pid == StartupPID)
          {
              StartupPID = 0;

              /*
!              * Check if we've received a signal from the startup process
!              * first. This can change pmState. If the startup process sends
!              * a signal and exits immediately after that, we might not have
!              * processed the signal yet. We need to know if it completed
!              * recovery before it exited.
               */
!             CheckRecoverySignals();

              /*
               * Unexpected exit of startup process (including FATAL exit)
               * during PM_STARTUP is treated as catastrophic. There is no
!              * other processes running yet.
               */
!             if (pmState == PM_STARTUP)
              {
                  LogChildExit(LOG, _("startup process"),
                               pid, exitstatus);
--- 2154,2177 ----
           */
          if (pid == StartupPID)
          {
+             bool recoveryCompleted;
+
              StartupPID = 0;

              /*
!              * Check if the startup process completed recovery before exiting
               */
!             if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_COMPLETED))
!                 recoveryCompleted = true;
!             else
!                 recoveryCompleted = false;

              /*
               * Unexpected exit of startup process (including FATAL exit)
               * during PM_STARTUP is treated as catastrophic. There is no
!              * other processes running yet, so we can just exit.
               */
!             if (pmState == PM_STARTUP && !recoveryCompleted)
              {
                  LogChildExit(LOG, _("startup process"),
                               pid, exitstatus);
***************
*** 2196,2212 **** reaper(SIGNAL_ARGS)
                  continue;
              }
              /*
               * Startup process exited normally, but didn't finish recovery.
               * This can happen if someone else than postmaster kills the
               * startup process with SIGTERM. Treat it like a crash.
               */
!             if (pmState == PM_RECOVERY || pmState == PM_RECOVERY_CONSISTENT)
              {
                  RecoveryError = true;
                  HandleChildCrash(pid, exitstatus,
                                   _("startup process"));
                  continue;
              }
          }

          /*
--- 2192,2255 ----
                  continue;
              }
              /*
+              * Startup process exited in response to a shutdown request (or
+              * it finished normally regardless of the shutdown request).
+              */
+             if (Shutdown > NoShutdown)
+             {
+                 pmState = PM_WAIT_BACKENDS;
+                 /* PostmasterStateMachine logic does the rest */
+                 continue;
+             }
+             /*
               * Startup process exited normally, but didn't finish recovery.
               * This can happen if someone else than postmaster kills the
               * startup process with SIGTERM. Treat it like a crash.
               */
!             if (!recoveryCompleted)
              {
                  RecoveryError = true;
                  HandleChildCrash(pid, exitstatus,
                                   _("startup process"));
                  continue;
              }
+
+             /*
+              * Startup succeeded, commence normal operations
+              */
+             pmState = PM_RUN;
+
+             /*
+              * Load the flat authorization file into postmaster's cache. The
+              * startup process has recomputed this from the database contents,
+              * so we wait till it finishes before loading it.
+              */
+             load_role();
+
+             /*
+              * Crank up the background writer, if we didn't do that already
+              * when we entered consistent recovery phase.  It doesn't matter
+              * if this fails, we'll just try again later.
+              */
+             if (BgWriterPID == 0)
+                 BgWriterPID = StartBackgroundWriter();
+
+             /*
+              * Likewise, start other special children as needed.  In a restart
+              * situation, some of them may be alive already.
+              */
+             if (WalWriterPID == 0)
+                 WalWriterPID = StartWalWriter();
+             if (AutoVacuumingActive() && AutoVacPID == 0)
+                 AutoVacPID = StartAutoVacLauncher();
+             if (XLogArchivingActive() && PgArchPID == 0)
+                 PgArchPID = pgarch_start();
+             if (PgStatPID == 0)
+                 PgStatPID = pgstat_start();
+
+             /* at this point we are really open for business */
+             ereport(LOG,
+                 (errmsg("database system is ready to accept connections")));
          }

          /*
***************
*** 2622,2748 **** LogChildExit(int lev, const char *procname, int pid, int exitstatus)
  static void
  PostmasterStateMachine(void)
  {
-     /* Startup states */
-
-     if (pmState == PM_STARTUP && RecoveryStatus > NoRecovery)
-     {
-         /* WAL redo has started. We're out of reinitialization. */
-         FatalError = false;
-
-         /*
-          * Go to shutdown mode if a shutdown request was pending.
-          */
-         if (Shutdown > NoShutdown)
-         {
-             pmState = PM_WAIT_BACKENDS;
-             /* PostmasterStateMachine logic does the rest */
-         }
-         else
-         {
-             /*
-              * Crank up the background writer.    It doesn't matter if this
-              * fails, we'll just try again later.
-              */
-             Assert(BgWriterPID == 0);
-             BgWriterPID = StartBackgroundWriter();
-
-             pmState = PM_RECOVERY;
-         }
-     }
-     if (pmState == PM_RECOVERY && RecoveryStatus >= RecoveryConsistent)
-     {
-         /*
-          * Go to shutdown mode if a shutdown request was pending.
-          */
-         if (Shutdown > NoShutdown)
-         {
-             pmState = PM_WAIT_BACKENDS;
-             /* PostmasterStateMachine logic does the rest */
-         }
-         else
-         {
-             /*
-              * Startup process has entered recovery. We consider that good
-              * enough to reset FatalError.
-              */
-             pmState = PM_RECOVERY_CONSISTENT;
-
-             /*
-              * Load the flat authorization file into postmaster's cache. The
-              * startup process won't have recomputed this from the database yet,
-              * so we it may change following recovery.
-              */
-             load_role();
-
-             /*
-              * Likewise, start other special children as needed.
-              */
-             Assert(PgStatPID == 0);
-             PgStatPID = pgstat_start();
-
-             /* XXX at this point we could accept read-only connections */
-             ereport(DEBUG1,
-                  (errmsg("database system is in consistent recovery mode")));
-         }
-     }
-     if ((pmState == PM_RECOVERY ||
-          pmState == PM_RECOVERY_CONSISTENT ||
-          pmState == PM_STARTUP) &&
-         RecoveryStatus == RecoveryCompleted)
-     {
-         /*
-          * Startup succeeded.
-          *
-          * Go to shutdown mode if a shutdown request was pending.
-          */
-         if (Shutdown > NoShutdown)
-         {
-             pmState = PM_WAIT_BACKENDS;
-             /* PostmasterStateMachine logic does the rest */
-         }
-         else
-         {
-             /*
-              * Otherwise, commence normal operations.
-              */
-             pmState = PM_RUN;
-
-             /*
-              * Load the flat authorization file into postmaster's cache. The
-              * startup process has recomputed this from the database contents,
-              * so we wait till it finishes before loading it.
-              */
-             load_role();
-
-             /*
-              * Crank up the background writer, if we didn't do that already
-              * when we entered consistent recovery phase.  It doesn't matter
-              * if this fails, we'll just try again later.
-              */
-             if (BgWriterPID == 0)
-                 BgWriterPID = StartBackgroundWriter();
-
-             /*
-              * Likewise, start other special children as needed.  In a restart
-              * situation, some of them may be alive already.
-              */
-             if (WalWriterPID == 0)
-                 WalWriterPID = StartWalWriter();
-             if (AutoVacuumingActive() && AutoVacPID == 0)
-                 AutoVacPID = StartAutoVacLauncher();
-             if (XLogArchivingActive() && PgArchPID == 0)
-                 PgArchPID = pgarch_start();
-             if (PgStatPID == 0)
-                 PgStatPID = pgstat_start();
-
-             /* at this point we are really open for business */
-             ereport(LOG,
-                 (errmsg("database system is ready to accept connections")));
-         }
-     }
-
-     /* Shutdown states */
-
      if (pmState == PM_WAIT_BACKUP)
      {
          /*
--- 2665,2670 ----
***************
*** 2904,2911 **** PostmasterStateMachine(void)
          shmem_exit(1);
          reset_shared(PostPortNumber);

-         RecoveryStatus = NoRecovery;
-
          StartupPID = StartupDataBase();
          Assert(StartupPID != 0);
          pmState = PM_STARTUP;
--- 2826,2831 ----
***************
*** 4010,4056 **** ExitPostmaster(int status)
  }

  /*
!  * common code used in sigusr1_handler() and reaper() to handle
!  * recovery-related signals from startup process
   */
  static void
! CheckRecoverySignals(void)
  {
!     bool changed = false;

!     if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED))
!     {
!         Assert(pmState == PM_STARTUP);

!         RecoveryStatus = RecoveryStarted;
!         changed = true;
!     }
!     if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_CONSISTENT))
      {
!         RecoveryStatus = RecoveryConsistent;
!         changed = true;
      }
!     if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_COMPLETED))
      {
!         RecoveryStatus = RecoveryCompleted;
!         changed = true;
!     }
!
!     if (changed)
!         PostmasterStateMachine();
! }

! /*
!  * sigusr1_handler - handle signal conditions from child processes
!  */
! static void
! sigusr1_handler(SIGNAL_ARGS)
! {
!     int            save_errno = errno;

!     PG_SETMASK(&BlockSig);

!     CheckRecoverySignals();

      if (CheckPostmasterSignal(PMSIGNAL_PASSWORD_CHANGE))
      {
--- 3930,3987 ----
  }

  /*
!  * sigusr1_handler - handle signal conditions from child processes
   */
  static void
! sigusr1_handler(SIGNAL_ARGS)
  {
!     int            save_errno = errno;

!     PG_SETMASK(&BlockSig);

!     /*
!      * RECOVERY_STARTED and RECOVERY_CONSISTENT signals are ignored in
!      * unexpected states. If the startup process quickly starts up, completes
!      * recovery, exits, we might process the death of the startup process
!      * first. We don't want to go back to recovery in that case.
!      */
!     if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED) &&
!         pmState == PM_STARTUP)
      {
!         /* WAL redo has started. We're out of reinitialization. */
!         FatalError = false;
!
!         /*
!          * Crank up the background writer.    It doesn't matter if this
!          * fails, we'll just try again later.
!          */
!         Assert(BgWriterPID == 0);
!         BgWriterPID = StartBackgroundWriter();
!
!         pmState = PM_RECOVERY;
      }
!     if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_CONSISTENT) &&
!         pmState == PM_RECOVERY)
      {
!         /*
!          * Load the flat authorization file into postmaster's cache. The
!          * startup process won't have recomputed this from the database yet,
!          * so we it may change following recovery.
!          */
!         load_role();

!         /*
!          * Likewise, start other special children as needed.
!          */
!         Assert(PgStatPID == 0);
!         PgStatPID = pgstat_start();

!         /* XXX at this point we could accept read-only connections */
!         ereport(DEBUG1,
!                 (errmsg("database system is in consistent recovery mode")));

!         pmState = PM_RECOVERY_CONSISTENT;
!     }

      if (CheckPostmasterSignal(PMSIGNAL_PASSWORD_CHANGE))
      {
*** a/src/backend/storage/ipc/ipc.c
--- b/src/backend/storage/ipc/ipc.c
***************
*** 76,81 **** static int    on_proc_exit_index,
--- 76,83 ----
  void
  proc_exit(int code)
  {
+     elog(LOG, "proc_exit: %d", on_proc_exit_index);
+
      /*
       * Once we set this flag, we are committed to exit.  Any ereport() will
       * NOT send control back to the main loop, but right back here.
***************
*** 95,102 **** proc_exit(int code)
      InterruptHoldoffCount = 1;
      CritSectionCount = 0;

-     elog(DEBUG3, "proc_exit(%d)", code);
-
      /* do our shared memory exits first */
      shmem_exit(code);

--- 97,102 ----
*** a/src/backend/storage/ipc/pmsignal.c
--- b/src/backend/storage/ipc/pmsignal.c
***************
*** 72,77 **** SendPostmasterSignal(PMSignalReason reason)
--- 72,94 ----
  }

  /*
+  * SetPostmasterSignal - like SendPostmasterSignal, but don't wake up
+  *                         postmaster
+  *
+  * This is for signals that the postmaster polls with CheckPostmasterSignal()
+  * but isn't interested in processing immediately.
+  */
+ void
+ SetPostmasterSignal(PMSignalReason reason)
+ {
+     /* If called in a standalone backend, do nothing */
+     if (!IsUnderPostmaster)
+         return;
+     /* Atomically set the proper flag */
+     PMSignalFlags[reason] = true;
+ }
+
+ /*
   * CheckPostmasterSignal - check to see if a particular reason has been
   * signaled, and clear the signal flag.  Should be called by postmaster
   * after receiving SIGUSR1.
*** a/src/include/storage/pmsignal.h
--- b/src/include/storage/pmsignal.h
***************
*** 39,44 **** typedef enum
--- 39,45 ----
   */
  extern void PMSignalInit(void);
  extern void SendPostmasterSignal(PMSignalReason reason);
+ extern void SetPostmasterSignal(PMSignalReason reason);
  extern bool CheckPostmasterSignal(PMSignalReason reason);
  extern bool PostmasterIsAlive(bool amDirectChild);

*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***************
*** 225,235 **** static pid_t StartupPID = 0,
--- 225,257 ----
  static int    Shutdown = NoShutdown;

  static bool FatalError = false; /* T if recovering from backend crash */
+ static bool RecoveryError = false; /* T if recovery failed */

  /*
   * We use a simple state machine to control startup, shutdown, and
   * crash recovery (which is rather like shutdown followed by startup).
   *
+  * After doing all the postmaster initialization work, we enter PM_STARTUP
+  * state and the startup process is launched. The startup process begins by
+  * reading the control file and other preliminary initialization steps. When
+  * it's ready to start WAL redo, it signals postmaster, and we switch to
+  * PM_RECOVERY phase. The background writer is launched, while the startup
+  * process continues applying WAL.
+  *
+  * After reaching a consistent point in WAL redo, startup process signals
+  * us again, and we switch to PM_RECOVERY_CONSISTENT phase. There's currently
+  * no difference between PM_RECOVERY and PM_RECOVERY_CONSISTENT, but we
+  * could start accepting connections to perform read-only queries at this
+  * point, if we had the infrastructure to do that.
+  *
+  * When the WAL redo is finished, the startup process signals us the third
+  * time, and exits. We don't process the 3d signal immediately but when we
+  * see the that the startup process has exited, we check that we have
+  * received the signal. If everything is OK, we then switch to PM_RUN state.
+  * The startup process can also skip the recovery and consistent recovery
+  * phases altogether, as it will during normal startup when there's no
+  * recovery to be done, for example.
+  *
   * Normal child backends can only be launched when we are in PM_RUN state.
   * (We also allow it in PM_WAIT_BACKUP state, but only for superusers.)
   * In other states we handle connection requests by launching "dead_end"
***************
*** 245,259 **** static bool FatalError = false; /* T if recovering from backend crash */
   *
   * Notice that this state variable does not distinguish *why* we entered
   * states later than PM_RUN --- Shutdown and FatalError must be consulted
!  * to find that out.  FatalError is never true in PM_RUN state, nor in
!  * PM_SHUTDOWN states (because we don't enter those states when trying to
!  * recover from a crash).  It can be true in PM_STARTUP state, because we
!  * don't clear it until we've successfully recovered.
   */
  typedef enum
  {
      PM_INIT,                    /* postmaster starting */
      PM_STARTUP,                    /* waiting for startup subprocess */
      PM_RUN,                        /* normal "database is alive" state */
      PM_WAIT_BACKUP,                /* waiting for online backup mode to end */
      PM_WAIT_BACKENDS,            /* waiting for live backends to exit */
--- 267,285 ----
   *
   * Notice that this state variable does not distinguish *why* we entered
   * states later than PM_RUN --- Shutdown and FatalError must be consulted
!  * to find that out.  FatalError is never true in PM_RECOVERY_* or PM_RUN
!  * states, nor in PM_SHUTDOWN states (because we don't enter those states
!  * when trying to recover from a crash).  It can be true in PM_STARTUP state,
!  * because we don't clear it until we've successfully started WAL redo.
!  * Similarly, RecoveryError means that we have crashed during recovery, and
!  * should not try to restart.
   */
  typedef enum
  {
      PM_INIT,                    /* postmaster starting */
      PM_STARTUP,                    /* waiting for startup subprocess */
+     PM_RECOVERY,                /* in recovery mode */
+     PM_RECOVERY_CONSISTENT,        /* consistent recovery mode */
      PM_RUN,                        /* normal "database is alive" state */
      PM_WAIT_BACKUP,                /* waiting for online backup mode to end */
      PM_WAIT_BACKENDS,            /* waiting for live backends to exit */
***************
*** 1302,1308 **** ServerLoop(void)
           * state that prevents it, start one.  It doesn't matter if this
           * fails, we'll just try again later.
           */
!         if (BgWriterPID == 0 && pmState == PM_RUN)
              BgWriterPID = StartBackgroundWriter();

          /*
--- 1328,1336 ----
           * state that prevents it, start one.  It doesn't matter if this
           * fails, we'll just try again later.
           */
!         if (BgWriterPID == 0 &&
!             (pmState == PM_RUN || pmState == PM_RECOVERY ||
!              pmState == PM_RECOVERY_CONSISTENT))
              BgWriterPID = StartBackgroundWriter();

          /*
***************
*** 1752,1758 **** canAcceptConnections(void)
              return CAC_WAITBACKUP;    /* allow superusers only */
          if (Shutdown > NoShutdown)
              return CAC_SHUTDOWN;    /* shutdown is pending */
!         if (pmState == PM_STARTUP && !FatalError)
              return CAC_STARTUP; /* normal startup */
          return CAC_RECOVERY;    /* else must be crash recovery */
      }
--- 1780,1789 ----
              return CAC_WAITBACKUP;    /* allow superusers only */
          if (Shutdown > NoShutdown)
              return CAC_SHUTDOWN;    /* shutdown is pending */
!         if (!FatalError &&
!             (pmState == PM_STARTUP ||
!              pmState == PM_RECOVERY ||
!              pmState == PM_RECOVERY_CONSISTENT))
              return CAC_STARTUP; /* normal startup */
          return CAC_RECOVERY;    /* else must be crash recovery */
      }
***************
*** 1982,1988 **** pmdie(SIGNAL_ARGS)
              ereport(LOG,
                      (errmsg("received smart shutdown request")));

!             if (pmState == PM_RUN)
              {
                  /* autovacuum workers are told to shut down immediately */
                  SignalAutovacWorkers(SIGTERM);
--- 2013,2020 ----
              ereport(LOG,
                      (errmsg("received smart shutdown request")));

!             if (pmState == PM_RUN || pmState == PM_RECOVERY ||
!                 pmState == PM_RECOVERY_CONSISTENT)
              {
                  /* autovacuum workers are told to shut down immediately */
                  SignalAutovacWorkers(SIGTERM);
***************
*** 2019,2025 **** pmdie(SIGNAL_ARGS)

              if (StartupPID != 0)
                  signal_child(StartupPID, SIGTERM);
!             if (pmState == PM_RUN || pmState == PM_WAIT_BACKUP)
              {
                  ereport(LOG,
                          (errmsg("aborting any active transactions")));
--- 2051,2064 ----

              if (StartupPID != 0)
                  signal_child(StartupPID, SIGTERM);
!             if (pmState == PM_RECOVERY)
!             {
!                 /* only bgwriter is active in this state */
!                 pmState = PM_WAIT_BACKENDS;
!             }
!             if (pmState == PM_RUN ||
!                 pmState == PM_WAIT_BACKUP ||
!                 pmState == PM_RECOVERY_CONSISTENT)
              {
                  ereport(LOG,
                          (errmsg("aborting any active transactions")));
***************
*** 2115,2125 **** reaper(SIGNAL_ARGS)
           */
          if (pid == StartupPID)
          {
              StartupPID = 0;
-             Assert(pmState == PM_STARTUP);

!             /* FATAL exit of startup is treated as catastrophic */
!             if (!EXIT_STATUS_0(exitstatus))
              {
                  LogChildExit(LOG, _("startup process"),
                               pid, exitstatus);
--- 2154,2177 ----
           */
          if (pid == StartupPID)
          {
+             bool recoveryCompleted;
+
              StartupPID = 0;

!             /*
!              * Check if the startup process completed recovery before exiting
!              */
!             if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_COMPLETED))
!                 recoveryCompleted = true;
!             else
!                 recoveryCompleted = false;
!
!             /*
!              * Unexpected exit of startup process (including FATAL exit)
!              * during PM_STARTUP is treated as catastrophic. There is no
!              * other processes running yet, so we can just exit.
!              */
!             if (pmState == PM_STARTUP && !recoveryCompleted)
              {
                  LogChildExit(LOG, _("startup process"),
                               pid, exitstatus);
***************
*** 2127,2141 **** reaper(SIGNAL_ARGS)
                  (errmsg("aborting startup due to startup process failure")));
                  ExitPostmaster(1);
              }
-
              /*
!              * Startup succeeded - we are done with system startup or
!              * recovery.
               */
!             FatalError = false;
!
              /*
!              * Go to shutdown mode if a shutdown request was pending.
               */
              if (Shutdown > NoShutdown)
              {
--- 2179,2199 ----
                  (errmsg("aborting startup due to startup process failure")));
                  ExitPostmaster(1);
              }
              /*
!              * Any unexpected exit (including FATAL exit) of the startup
!              * process is treated as a crash, except that we don't want
!              * to reinitialize.
               */
!             if (!EXIT_STATUS_0(exitstatus))
!             {
!                 RecoveryError = true;
!                 HandleChildCrash(pid, exitstatus,
!                                  _("startup process"));
!                 continue;
!             }
              /*
!              * Startup process exited in response to a shutdown request (or
!              * it finished normally regardless of the shutdown request).
               */
              if (Shutdown > NoShutdown)
              {
***************
*** 2143,2151 **** reaper(SIGNAL_ARGS)
                  /* PostmasterStateMachine logic does the rest */
                  continue;
              }

              /*
!              * Otherwise, commence normal operations.
               */
              pmState = PM_RUN;

--- 2201,2221 ----
                  /* PostmasterStateMachine logic does the rest */
                  continue;
              }
+             /*
+              * Startup process exited normally, but didn't finish recovery.
+              * This can happen if someone else than postmaster kills the
+              * startup process with SIGTERM. Treat it like a crash.
+              */
+             if (!recoveryCompleted)
+             {
+                 RecoveryError = true;
+                 HandleChildCrash(pid, exitstatus,
+                                  _("startup process"));
+                 continue;
+             }

              /*
!              * Startup succeeded, commence normal operations
               */
              pmState = PM_RUN;

***************
*** 2157,2167 **** reaper(SIGNAL_ARGS)
              load_role();

              /*
!              * Crank up the background writer.    It doesn't matter if this
!              * fails, we'll just try again later.
               */
!             Assert(BgWriterPID == 0);
!             BgWriterPID = StartBackgroundWriter();

              /*
               * Likewise, start other special children as needed.  In a restart
--- 2227,2238 ----
              load_role();

              /*
!              * Crank up the background writer, if we didn't do that already
!              * when we entered consistent recovery phase.  It doesn't matter
!              * if this fails, we'll just try again later.
               */
!             if (BgWriterPID == 0)
!                 BgWriterPID = StartBackgroundWriter();

              /*
               * Likewise, start other special children as needed.  In a restart
***************
*** 2178,2186 **** reaper(SIGNAL_ARGS)

              /* at this point we are really open for business */
              ereport(LOG,
!                  (errmsg("database system is ready to accept connections")));
!
!             continue;
          }

          /*
--- 2249,2255 ----

              /* at this point we are really open for business */
              ereport(LOG,
!                 (errmsg("database system is ready to accept connections")));
          }

          /*
***************
*** 2443,2448 **** HandleChildCrash(int pid, int exitstatus, const char *procname)
--- 2512,2529 ----
          }
      }

+     /* Take care of the startup process too */
+     if (pid == StartupPID)
+         StartupPID = 0;
+     else if (StartupPID != 0 && !FatalError)
+     {
+         ereport(DEBUG2,
+                 (errmsg_internal("sending %s to process %d",
+                                  (SendStop ? "SIGSTOP" : "SIGQUIT"),
+                                  (int) StartupPID)));
+         signal_child(BgWriterPID, (SendStop ? SIGSTOP : SIGQUIT));
+     }
+
      /* Take care of the bgwriter too */
      if (pid == BgWriterPID)
          BgWriterPID = 0;
***************
*** 2514,2520 **** HandleChildCrash(int pid, int exitstatus, const char *procname)

      FatalError = true;
      /* We now transit into a state of waiting for children to die */
!     if (pmState == PM_RUN ||
          pmState == PM_WAIT_BACKUP ||
          pmState == PM_SHUTDOWN)
          pmState = PM_WAIT_BACKENDS;
--- 2595,2603 ----

      FatalError = true;
      /* We now transit into a state of waiting for children to die */
!     if (pmState == PM_RECOVERY ||
!         pmState == PM_RECOVERY_CONSISTENT ||
!         pmState == PM_RUN ||
          pmState == PM_WAIT_BACKUP ||
          pmState == PM_SHUTDOWN)
          pmState = PM_WAIT_BACKENDS;
***************
*** 2723,2728 **** PostmasterStateMachine(void)
--- 2806,2820 ----
      }

      /*
+      * If recovery failed, wait for all non-syslogger children to exit,
+      * and then exit postmaster. We don't try to reinitialize when recovery
+      * fails, because more than likely it will just fail again and we will
+      * keep trying forever.
+      */
+     if (RecoveryError && pmState == PM_NO_CHILDREN)
+         ExitPostmaster(1);
+
+     /*
       * If we need to recover from a crash, wait for all non-syslogger
       * children to exit, then reset shmem and StartupDataBase.
       */
***************
*** 3847,3852 **** sigusr1_handler(SIGNAL_ARGS)
--- 3939,3988 ----

      PG_SETMASK(&BlockSig);

+     /*
+      * RECOVERY_STARTED and RECOVERY_CONSISTENT signals are ignored in
+      * unexpected states. If the startup process quickly starts up, completes
+      * recovery, exits, we might process the death of the startup process
+      * first. We don't want to go back to recovery in that case.
+      */
+     if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED) &&
+         pmState == PM_STARTUP)
+     {
+         /* WAL redo has started. We're out of reinitialization. */
+         FatalError = false;
+
+         /*
+          * Crank up the background writer.    It doesn't matter if this
+          * fails, we'll just try again later.
+          */
+         Assert(BgWriterPID == 0);
+         BgWriterPID = StartBackgroundWriter();
+
+         pmState = PM_RECOVERY;
+     }
+     if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_CONSISTENT) &&
+         pmState == PM_RECOVERY)
+     {
+         /*
+          * Load the flat authorization file into postmaster's cache. The
+          * startup process won't have recomputed this from the database yet,
+          * so we it may change following recovery.
+          */
+         load_role();
+
+         /*
+          * Likewise, start other special children as needed.
+          */
+         Assert(PgStatPID == 0);
+         PgStatPID = pgstat_start();
+
+         /* XXX at this point we could accept read-only connections */
+         ereport(DEBUG1,
+                 (errmsg("database system is in consistent recovery mode")));
+
+         pmState = PM_RECOVERY_CONSISTENT;
+     }
+
      if (CheckPostmasterSignal(PMSIGNAL_PASSWORD_CHANGE))
      {
          /*

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> The thing wrong with it is assuming that nothing interesting will happen
>> during proc_exit().  We hang enough stuff on on_proc_exit hooks that
>> that seems like a pretty shaky assumption.

> I can't get too worried, given that proc_exit() is a very well-beaten 
> code path. Admittedly not so much for an auxiliary process, but that's 
> just a dumbed down version of what happens with a full-blown backend.

Well, you're assuming that no future patch or add-on module will put
anything into an on_proc_exit hook that might interact with other
processes.  It might be fine now but I don't think it's very robust.

> However I started looking into that idea anyway, and figured that it 
> does simplify the logic in postmaster.c quite a bit, so I think it's 
> worth doing on those grounds alone.

Couldn't you get rid of PMSIGNAL_RECOVERY_COMPLETED altogether?  If the
startup process exits with code 0, recovery is complete, else there
was trouble.  I find this SetPostmasterSignal bit quite ugly anyway.
        regards, tom lane


Re: Re: [COMMITTERS] pgsql: Start background writer during archive recovery.

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Couldn't you get rid of PMSIGNAL_RECOVERY_COMPLETED altogether?  If the
> startup process exits with code 0, recovery is complete, else there
> was trouble.  I find this SetPostmasterSignal bit quite ugly anyway.

Right now, the startup process exits with code 0 also when it's told to 
exit with SIGTERM, ie. fast shutdown request, and the recovery-completed 
signal is used to differentiate those cases. But yeah, we can use 
another exit code for that. I'll look into that approach.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Re: [COMMITTERS] pgsql: Start background writer during archive recovery.

From
Heikki Linnakangas
Date:
Heikki Linnakangas wrote:
> Tom Lane wrote:
>> Couldn't you get rid of PMSIGNAL_RECOVERY_COMPLETED altogether?  If the
>> startup process exits with code 0, recovery is complete, else there
>> was trouble.  I find this SetPostmasterSignal bit quite ugly anyway.
> 
> Right now, the startup process exits with code 0 also when it's told to 
> exit with SIGTERM, ie. fast shutdown request, and the recovery-completed 
> signal is used to differentiate those cases. But yeah, we can use 
> another exit code for that. I'll look into that approach.

Committed that.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com