Re: unable to fail over to warm standby server - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: unable to fail over to warm standby server
Date
Msg-id 4B71BBCB.8070302@enterprisedb.com
Whole thread Raw
In response to Re: unable to fail over to warm standby server  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: unable to fail over to warm standby server  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: unable to fail over to warm standby server  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: unable to fail over to warm standby server  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-bugs
Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Jan 29, 2010 at 3:32 PM, Heikki Linnakangas
>>> That only affects the error message and is harmless otherwise, but I
>>> thought I'd mention it. I'll fix it, unless someone wants to argue that
>>> its more useful to print the raw return value of system(), because it
>>> might contain more information like which signal killed the process,
>>> that you could extract from the cryptic error code using e.g WTERMSIG()
>>> macro.
>
>> An "if" statement would seem to be in order, so that you can print out
>> either the exit code or the signal number as appropriate.
>
> Yes.  Please see the existing code in the postmaster that prints
> subprocess exit codes, and duplicate it (or perhaps refactor so you can
> avoid code duplication; though translatability of messages might limit
> what you can do there).

Here's what I came up with. Translatability indeed makes it pretty hard,
I ended up copy-pasting.

BTW, I don't think I'm going to bother or risk back-patching this. It
was harmless, and for forensic purposes all the information was there in
the old message too, just in a cryptic format. And the new messages
would need translating too.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 546,551 **** static void pg_start_backup_callback(int code, Datum arg);
--- 546,552 ----
  static bool read_backup_label(XLogRecPtr *checkPointLoc);
  static void rm_redo_error_callback(void *arg);
  static int    get_sync_bit(int method);
+ static int errdetail_childexit(const char *procname, int exitstatus);


  /*
***************
*** 2959,2966 **** RestoreArchivedFile(char *path, const char *xlogfname,
      signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125;

      ereport(signaled ? FATAL : DEBUG2,
!         (errmsg("could not restore file \"%s\" from archive: return code %d",
!                 xlogfname, rc)));

  not_available:
      /*
--- 2960,2967 ----
      signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125;

      ereport(signaled ? FATAL : DEBUG2,
!             (errmsg("could not restore file \"%s\" from archive", xlogfname),
!              errdetail_childexit("restore_command", rc)));

  not_available:
      /*
***************
*** 3077,3088 **** ExecuteRecoveryEndCommand(void)
          signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125;

          ereport(signaled ? FATAL : WARNING,
!                 (errmsg("recovery_end_command \"%s\": return code %d",
!                         xlogRecoveryEndCmd, rc)));
      }
  }

  /*
   * Preallocate log files beyond the specified log endpoint.
   *
   * XXX this is currently extremely conservative, since it forces only one
--- 3078,3149 ----
          signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125;

          ereport(signaled ? FATAL : WARNING,
!                 (errmsg("recovery end command \"%s\" failed", xlogRecoveryEndCmd),
!                  errdetail_childexit("recovery_end_command", rc)));
      }
  }

  /*
+  * Add detail information (and possibly a hint) describing why a child
+  * process exited to the error currently being constructed. This is for
+  * use inside ereport, e.g:
+  *
+  * ereport(DEBUG1,
+  *           (errmsg(...),
+  *           (errdetail_childexit("mycommand", rc))));
+  *
+  * 'procname' is a name of the exited command, and 'exitstatus' is a return
+  * code as returned by e.g wait(2).
+  *
+  * The logic is copied from LogChildExit() in postmaster/postmaster.c. If
+  * you change this, make sure you keep LogChildExit in sync.
+  */
+ static int
+ errdetail_childexit(const char *procname, int exitstatus)
+ {
+     if (WIFEXITED(exitstatus))
+         /*------
+           translator: %s is a noun phrase describing a child process, such as
+           "restore_command" */
+         errdetail("%s exited with exit code %d",
+                   procname, WEXITSTATUS(exitstatus));
+     else if (WIFSIGNALED(exitstatus))
+     {
+ #if defined(WIN32)
+         /*------
+           translator: %s is a noun phrase describing a child process, such as
+           "restore_command" */
+         errdetail("%s was terminated by exception 0x%X",
+                   procname, WTERMSIG(exitstatus));
+         errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value.");
+ #elif defined(HAVE_DECL_SYS_SIGLIST) && HAVE_DECL_SYS_SIGLIST
+         /*------
+           translator: %s is a noun phrase describing a child process, such as
+           "restore_command" */
+         errdetail("%s was terminated by signal %d: %s",
+                   procname, WTERMSIG(exitstatus),
+                   WTERMSIG(exitstatus) < NSIG ?
+                   sys_siglist[WTERMSIG(exitstatus)] : "(unknown)");
+ #else
+         /*------
+           translator: %s is a noun phrase describing a child process, such as
+           "restore_command" */
+         errdetail("%s was terminated by signal %d",
+                   procname, WTERMSIG(exitstatus));
+ #endif
+     }
+     else
+         /*------
+           translator: %s is a noun phrase describing a child process, such as
+           "restore_command" */
+         errdetail("%s exited with unrecognized status %d",
+                   procname, exitstatus);
+
+     /* return dummy value, so that this can be used inside ereport(...) */
+     return 1;
+ }
+
+ /*
   * Preallocate log files beyond the specified log endpoint.
   *
   * XXX this is currently extremely conservative, since it forces only one
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***************
*** 2770,2775 **** HandleChildCrash(int pid, int exitstatus, const char *procname)
--- 2770,2779 ----

  /*
   * Log the death of a child process.
+  *
+  * errdetail_childexit() function in xlog.c uses the same logic for
+  * constructing the message. If you change this function, remember to
+  * keep errdetail_childexit() in sync.
   */
  static void
  LogChildExit(int lev, const char *procname, int pid, int exitstatus)

pgsql-bugs by date:

Previous
From: Russell Smith
Date:
Subject: Re: BUG #5319: recursion in the triggers
Next
From: Tom Lane
Date:
Subject: Re: unable to fail over to warm standby server