Re: [HACKERS] For Review: Allow WAL information to recover corrupted - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: [HACKERS] For Review: Allow WAL information to recover corrupted
Date
Msg-id 200604260218.k3Q2Ifk02254@candle.pha.pa.us
Whole thread Raw
List pgsql-patches
Patch attached and applied, with documentation additions.  Thanks.

---------------------------------------------------------------------------

yuanjia lee wrote:
>
> Hi All
>
> I had added an option -r to pg_resetxlog to enable the tool can rebuild the corrupted pg_control file from the old
xlogfiles.  
>
> here is the patch. Sorry I had tried to attached it to the mail, but it failed, I dont know why, here is the
link:http://www.geocities.com/yuanjia_pg/pg_resetxlog.diff.txt
>
> There are also some changes in the logic of other options.
> Option -n: only print out the control values in the existing pg_control file, if the file is corrupted , inform the
useto rebuild it first only. 
> Option -f: if pg_control file is fine, then reset the xlog file; if pg_control is corrupted , then try to rebuild the
controlfile from old xlog file, if it fails, then just guessing the value, then reset the xlog file. 
>
>  The algorithm of restoring the pg_control value from old xlog file:
>  1. Retrieve all of the active xlog files from xlog directory into a list by increasing order, according their
timeline,log id, segment id. (Tom had informed me that we can not know which segment file is latest just by the name
itself,so before adding the segment file to the list, it should be checked that it is an active segment file.) 
> 2. Search the list to find the oldest xlog file of the latest time line. (Although it is better to let the user to
selectthe time line which is used for rebuild the xlog file, but I think there is not so necessary. I had tried to use
onlythe last file in the latest time line, but I found that in many cases, there are possible that the last checkpoint
recordand the previous checkpoint record are stored separately in different segment file, so I had to search from the
oldestone.) 
> 3. Search the records from the oldest xlog file of latest time line to the latest xlog file of latest time line, if
thecheckpoint record has been found, update the latest checkpoint and previous checkpoint.  
>
> Some of the code is borrowed from Tom Lane xlogdump.c file.
>
> Hope for your advice.
>
> Best Regards
> Yuanjia Lee
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com

--
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/ref/pg_resetxlog.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/pg_resetxlog.sgml,v
retrieving revision 1.13
diff -c -c -r1.13 pg_resetxlog.sgml
*** doc/src/sgml/ref/pg_resetxlog.sgml    25 Apr 2006 21:02:33 -0000    1.13
--- doc/src/sgml/ref/pg_resetxlog.sgml    26 Apr 2006 02:14:43 -0000
***************
*** 20,25 ****
--- 20,26 ----
     <command>pg_resetxlog</command>
     <arg>-f</arg>
     <arg>-n</arg>
+    <arg>-r</arg>
     <arg>-o<replaceable class="parameter">oid</replaceable> </arg>
     <arg>-x <replaceable class="parameter">xid</replaceable> </arg>
     <arg>-m <replaceable class="parameter">mxid</replaceable> </arg>
***************
*** 57,78 ****

    <para>
     If <command>pg_resetxlog</command> complains that it cannot determine
!    valid data for <filename>pg_control</>, you can force it to proceed anyway
!    by specifying the <literal>-f</> (force) switch.  In this case plausible
!    values will be substituted for the missing data.  Most of the fields can be
!    expected to match, but manual assistance may be needed for the next OID,
!    next transaction ID, next multitransaction ID and offset,
!    WAL starting address, and database locale fields.
!    The first five of these can be set using the switches discussed below.
!    <command>pg_resetxlog</command>'s own environment is the source for its
!    guess at the locale fields; take care that <envar>LANG</> and so forth
!    match the environment that <command>initdb</> was run in.
!    If you are not able to determine correct values for all these fields,
!    <literal>-f</> can still be used, but
     the recovered database must be treated with even more suspicion than
!    usual: an immediate dump and reload is imperative.  <emphasis>Do not</>
!    execute any data-modifying operations in the database before you dump;
!    as any such action is likely to make the corruption worse.
    </para>

    <para>
--- 58,79 ----

    <para>
     If <command>pg_resetxlog</command> complains that it cannot determine
!    valid data for <filename>pg_control</>, you can force it to proceed
!    anyway by specifying the <literal>-f</> (force) switch. In this case
!    plausible values will be substituted for the missing data.
!    <command>pg_resetxlog</command>'s own environment is the source for
!    its guess at the locale fields; take care that <envar>LANG</> and so
!    forth match the environment that <command>initdb</> was run in.
!    <filename>/xlog</> files are used to determine other parameters, like
!    next OID, next transaction ID, next multi-transaction ID and offset,
!    WAL starting address, and database locale fields. Because determined
!    values might be wrong, the first five of these can be set using the
!    switches discussed below. If you are not able to determine correct
!    values for all these fields, <literal>-f</> can still be used, but
     the recovered database must be treated with even more suspicion than
!    usual: an immediate dump and reload is imperative. <emphasis>Do
!    not</> execute any data-modifying operations in the database before
!    you dump; as any such action is likely to make the corruption worse.
    </para>

    <para>
***************
*** 150,155 ****
--- 151,161 ----
    </para>

    <para>
+    The <literal>-r</> restores <filename>pg_control</> counters listed
+    above without resetting the write-ahead log.
+   </para>
+
+   <para>
     The <literal>-n</> (no operation) switch instructs
     <command>pg_resetxlog</command> to print the values reconstructed from
     <filename>pg_control</> and then exit without modifying anything.
Index: src/bin/pg_resetxlog/pg_resetxlog.c
===================================================================
RCS file: /cvsroot/pgsql/src/bin/pg_resetxlog/pg_resetxlog.c,v
retrieving revision 1.43
diff -c -c -r1.43 pg_resetxlog.c
*** src/bin/pg_resetxlog/pg_resetxlog.c    5 Apr 2006 03:34:05 -0000    1.43
--- src/bin/pg_resetxlog/pg_resetxlog.c    26 Apr 2006 02:14:48 -0000
***************
*** 4,29 ****
   *      A utility to "zero out" the xlog when it's corrupt beyond recovery.
   *      Can also rebuild pg_control if needed.
   *
!  * The theory of operation is fairly simple:
   *      1. Read the existing pg_control (which will include the last
   *         checkpoint record).  If it is an old format then update to
   *         current format.
!  *      2. If pg_control is corrupt, attempt to intuit reasonable values,
!  *         by scanning the old xlog if necessary.
   *      3. Modify pg_control to reflect a "shutdown" state with a checkpoint
   *         record at the start of xlog.
   *      4. Flush the existing xlog files and write a new segment with
   *         just a checkpoint record in it.  The new segment is positioned
   *         just past the end of the old xlog, so that existing LSNs in
   *         data pages will appear to be "in the past".
-  * This is all pretty straightforward except for the intuition part of
-  * step 2 ...
   *
!  *
!  * Portions Copyright (c) 1996-2006, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
   *
-  * $PostgreSQL: pgsql/src/bin/pg_resetxlog/pg_resetxlog.c,v 1.43 2006/04/05 03:34:05 tgl Exp $
   *
   *-------------------------------------------------------------------------
   */
--- 4,32 ----
   *      A utility to "zero out" the xlog when it's corrupt beyond recovery.
   *      Can also rebuild pg_control if needed.
   *
!  * The theory of reset operation is fairly simple:
   *      1. Read the existing pg_control (which will include the last
   *         checkpoint record).  If it is an old format then update to
   *         current format.
!  *      2. If pg_control is corrupt, attempt to rebuild the values,
!  *         by scanning the old xlog; if it fail then try to guess it.
   *      3. Modify pg_control to reflect a "shutdown" state with a checkpoint
   *         record at the start of xlog.
   *      4. Flush the existing xlog files and write a new segment with
   *         just a checkpoint record in it.  The new segment is positioned
   *         just past the end of the old xlog, so that existing LSNs in
   *         data pages will appear to be "in the past".
   *
!  * The algorithm of restoring the pg_control value from old xlog file:
!  *    1. Retrieve all of the active xlog files from xlog direcotry into a list
!  *       by increasing order, according their timeline, log id, segment id.
!  *    2. Search the list to find the oldest xlog file of the lastest time line.
!  *    3. Search the records from the oldest xlog file of latest time line
!  *       to the latest xlog file of latest time line, if the checkpoint record
!  *       has been found, update the latest checkpoint and previous checkpoint.
!  * Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
   *
   *
   *-------------------------------------------------------------------------
   */
***************
*** 46,51 ****
--- 49,57 ----
  #include "catalog/catversion.h"
  #include "catalog/pg_control.h"

+ #define GUESS    0
+ #define WAL    1
+
  extern int    optind;
  extern char *optarg;

***************
*** 53,75 ****
  static ControlFileData ControlFile;        /* pg_control values */
  static uint32 newXlogId,
              newXlogSeg;            /* ID/Segment of new XLOG segment */
- static bool guessed = false;    /* T if we had to guess at any values */
  static const char *progname;

  static bool ReadControlFile(void);
! static void GuessControlValues(void);
! static void PrintControlValues(bool guessed);
  static void RewriteControlFile(void);
  static void KillExistingXLOG(void);
  static void WriteEmptyXLOG(void);
  static void usage(void);


  int
  main(int argc, char *argv[])
  {
      int            c;
      bool        force = false;
      bool        noupdate = false;
      TransactionId set_xid = 0;
      Oid            set_oid = 0;
--- 59,133 ----
  static ControlFileData ControlFile;        /* pg_control values */
  static uint32 newXlogId,
              newXlogSeg;            /* ID/Segment of new XLOG segment */
  static const char *progname;
+ static uint64        sysidentifier=-1;
+
+ /*
+  * We use a list to store the active xlog files we had found in the
+  * xlog directory in increasing order according the time line, logid,
+  * segment id.
+  *
+  */
+ typedef struct XLogFileName {
+     TimeLineID tli;
+     uint32 logid;
+     uint32 seg;
+     char fname[256];
+     struct XLogFileName *next;
+ }    XLogFileName;
+
+ /* The list head */
+ static XLogFileName *xlogfilelist=NULL;
+
+ /* LastXLogfile is the latest file in the latest time line,
+    CurXLogfile is the oldest file in the lastest time line
+    */
+ static XLogFileName *CurXLogFile, *LastXLogFile;
+
+ /* The last checkpoint found in xlog file.*/
+ static CheckPoint      lastcheckpoint;
+
+ /* The last and previous checkpoint pointers found in xlog file.*/
+ static XLogRecPtr     prevchkp, lastchkp;
+
+ /* the database state.*/
+ static DBState    state=DB_SHUTDOWNED;
+
+ /* the total checkpoint numbers which had been found in the xlog file.*/
+ static int         found_checkpoint=0;
+

  static bool ReadControlFile(void);
! static bool RestoreControlValues(int mode);
! static void PrintControlValues(void);
! static void UpdateCtlFile4Reset(void);
  static void RewriteControlFile(void);
  static void KillExistingXLOG(void);
  static void WriteEmptyXLOG(void);
  static void usage(void);

+ static void GetXLogFiles(void);
+ static bool ValidXLogFileName(char * fname);
+ static bool ValidXLogFileHeader(XLogFileName *segfile);
+ static bool ValidXLOGPageHeader(XLogPageHeader hdr, uint tli, uint id, uint seg);
+ static bool CmpXLogFileOT(XLogFileName * f1, XLogFileName *f2);
+ static bool IsNextSeg(XLogFileName *prev, XLogFileName *cur);
+ static void InsertXLogFile( char * fname );
+ static bool ReadXLogPage(void);
+ static bool RecordIsValid(XLogRecord *record, XLogRecPtr recptr);
+ static bool FetchRecord(void);
+ static void UpdateCheckPoint(XLogRecord *record);
+ static void SelectStartXLog(void);
+ static int SearchLastCheckpoint(void);
+ static int OpenXLogFile(XLogFileName *sf);
+ static void CleanUpList(XLogFileName *list);

  int
  main(int argc, char *argv[])
  {
      int            c;
      bool        force = false;
+     bool        restore = false;
      bool        noupdate = false;
      TransactionId set_xid = 0;
      Oid            set_oid = 0;
***************
*** 84,90 ****
      char       *DataDir;
      int            fd;
      char        path[MAXPGPATH];
!
      set_pglocale_pgservice(argv[0], "pg_resetxlog");

      progname = get_progname(argv[0]);
--- 142,150 ----
      char       *DataDir;
      int            fd;
      char        path[MAXPGPATH];
!     bool        ctlcorrupted = false;
!     bool        PidLocked = false;
!
      set_pglocale_pgservice(argv[0], "pg_resetxlog");

      progname = get_progname(argv[0]);
***************
*** 104,117 ****
      }


!     while ((c = getopt(argc, argv, "fl:m:no:O:x:")) != -1)
      {
          switch (c)
          {
              case 'f':
                  force = true;
                  break;
!
              case 'n':
                  noupdate = true;
                  break;
--- 164,181 ----
      }


!     while ((c = getopt(argc, argv, "fl:m:no:O:x:r")) != -1)
      {
          switch (c)
          {
              case 'f':
                  force = true;
                  break;
!
!             case 'r':
!                 restore = true;
!                 break;
!
              case 'n':
                  noupdate = true;
                  break;
***************
*** 255,271 ****
      }
      else
      {
!         fprintf(stderr, _("%s: lock file \"%s\" exists\n"
!                           "Is a server running?  If not, delete the lock file and try again.\n"),
!                 progname, path);
!         exit(1);
      }

      /*
       * Attempt to read the existing pg_control file
       */
      if (!ReadControlFile())
!         GuessControlValues();

      /*
       * Adjust fields if required by switches.  (Do this now so that printout,
--- 319,335 ----
      }
      else
      {
!         PidLocked = true;
      }

      /*
       * Attempt to read the existing pg_control file
       */
      if (!ReadControlFile())
!     {
!         /* The control file has been corruptted.*/
!         ctlcorrupted = true;
!     }

      /*
       * Adjust fields if required by switches.  (Do this now so that printout,
***************
*** 294,319 ****
          ControlFile.logSeg = minXlogSeg;
      }

      /*
!      * If we had to guess anything, and -f was not given, just print the
!      * guessed values and exit.  Also print if -n is given.
       */
!     if ((guessed && !force) || noupdate)
      {
!         PrintControlValues(guessed);
!         if (!noupdate)
          {
!             printf(_("\nIf these values seem acceptable, use -f to force reset.\n"));
!             exit(1);
!         }
!         else
              exit(0);
      }

      /*
       * Don't reset from a dirty pg_control without -f, either.
       */
!     if (ControlFile.state != DB_SHUTDOWNED && !force)
      {
          printf(_("The database server was not shut down cleanly.\n"
                   "Resetting the transaction log may cause data to be lost.\n"
--- 358,438 ----
          ControlFile.logSeg = minXlogSeg;
      }

+     /* retore the broken control file from WAL file.*/
+     if (restore)
+     {
+
+         /* If the control fine is fine, don't touch it.*/
+         if ( !ctlcorrupted )
+         {
+             printf(_("\nThe control file seems fine, not need to restore it.\n"));
+             printf(_("If you want to restore it anyway, use -f option, but this also will reset the log file.\n"));
+             exit(0);
+         }
+
+
+         /* Try to restore control values from old xlog file, or complain it.*/
+         if (RestoreControlValues(WAL))
+         {
+             /* Success in restoring the checkpoint information from old xlog file.*/
+
+             /* Print it out.*/
+             PrintControlValues();
+
+             /* In case the postmaster is crashed.
+              * But it may be dangerous for the living one.
+              * It may need a more good way.
+              */
+             if (PidLocked)
+             {
+                 ControlFile.state = DB_IN_PRODUCTION;
+             }
+             /* Write the new control file. */
+             RewriteControlFile();
+             printf(_("\nThe control file had been restored.\n"));
+         }
+         else
+         {
+             /* Fail in restoring the checkpoint information from old xlog file. */
+             printf(_("\nCan not restore the control file from XLog file..\n"));
+             printf(_("\nIf you want to restore it anyway, use -f option to guess the information, but this also will
resetthe log file.\n")); 
+         }
+
+         exit(0);
+
+     }
+     if (PidLocked)
+     {
+         fprintf(stderr, _("%s: lock file \"%s\" exists\n"
+                           "Is a server running?  If not, delete the lock file and try again.\n"),
+                 progname, path);
+         exit(1);
+
+     }
      /*
!     * Print out the values in control file if -n is given. if the control file is
!     * corrupted, then inform user to restore it first.
       */
!     if (noupdate)
      {
!         if (!ctlcorrupted)
          {
!             /* The control file is fine, print the values out.*/
!             PrintControlValues();
              exit(0);
+         }
+         else{
+             /* The control file is corrupted.*/
+             printf(_("The control file had been corrupted.\n"));
+             printf(_("Please use -r option to restore it first.\n"));
+             exit(1);
+             }
      }

      /*
       * Don't reset from a dirty pg_control without -f, either.
       */
!     if (ControlFile.state != DB_SHUTDOWNED && !force && !ctlcorrupted)
      {
          printf(_("The database server was not shut down cleanly.\n"
                   "Resetting the transaction log may cause data to be lost.\n"
***************
*** 321,334 ****
          exit(1);
      }

!     /*
!      * Else, do the dirty deed.
       */
      RewriteControlFile();
      KillExistingXLOG();
      WriteEmptyXLOG();
!
!     printf(_("Transaction log reset\n"));
      return 0;
  }

--- 440,474 ----
          exit(1);
      }

! /*
!      * Try to reset the xlog file.
       */
+
+     /* If the control file is corrupted, and -f option is given, resotre it first.*/
+     if ( ctlcorrupted )
+     {
+         if (force)
+         {
+             if (!RestoreControlValues(WAL))
+             {
+                 printf(_("fails to recover the control file from old xlog files, so we had to guess it.\n"));
+                 RestoreControlValues(GUESS);
+             }
+             printf(_("Restored the control file from old xlog files.\n"));
+         }
+         else
+         {
+             printf(_("Control file corrupted.\nIf you want to proceed anyway, use -f to force reset.\n"));
+             exit(1);
+             }
+     }
+
+     /* Reset the xlog fille.*/
+     UpdateCtlFile4Reset();
      RewriteControlFile();
      KillExistingXLOG();
      WriteEmptyXLOG();
!     printf(_("Transaction log reset\n"));
      return 0;
  }

***************
*** 397,403 ****
                  progname);
          /* We will use the data anyway, but treat it as guessed. */
          memcpy(&ControlFile, buffer, sizeof(ControlFile));
-         guessed = true;
          return true;
      }

--- 537,542 ----
***************
*** 408,458 ****
  }


  /*
!  * Guess at pg_control values when we can't read the old ones.
   */
! static void
! GuessControlValues(void)
  {
-     uint64        sysidentifier;
      struct timeval tv;
      char       *localeptr;

      /*
       * Set up a completely default set of pg_control values.
       */
-     guessed = true;
      memset(&ControlFile, 0, sizeof(ControlFile));

      ControlFile.pg_control_version = PG_CONTROL_VERSION;
      ControlFile.catalog_version_no = CATALOG_VERSION_NO;

!     /*
!      * Create a new unique installation identifier, since we can no longer use
!      * any old XLOG records.  See notes in xlog.c about the algorithm.
       */
!     gettimeofday(&tv, NULL);
!     sysidentifier = ((uint64) tv.tv_sec) << 32;
!     sysidentifier |= (uint32) (tv.tv_sec | tv.tv_usec);
!
!     ControlFile.system_identifier = sysidentifier;
!
!     ControlFile.checkPointCopy.redo.xlogid = 0;
!     ControlFile.checkPointCopy.redo.xrecoff = SizeOfXLogLongPHD;
!     ControlFile.checkPointCopy.undo = ControlFile.checkPointCopy.redo;
!     ControlFile.checkPointCopy.ThisTimeLineID = 1;
!     ControlFile.checkPointCopy.nextXid = (TransactionId) 514;    /* XXX */
!     ControlFile.checkPointCopy.nextOid = FirstBootstrapObjectId;
!     ControlFile.checkPointCopy.nextMulti = FirstMultiXactId;
!     ControlFile.checkPointCopy.nextMultiOffset = 0;
!     ControlFile.checkPointCopy.time = time(NULL);

-     ControlFile.state = DB_SHUTDOWNED;
      ControlFile.time = time(NULL);
!     ControlFile.logId = 0;
!     ControlFile.logSeg = 1;
!     ControlFile.checkPoint = ControlFile.checkPointCopy.redo;
!
      ControlFile.maxAlign = MAXIMUM_ALIGNOF;
      ControlFile.floatFormat = FLOATFORMAT_VALUE;
      ControlFile.blcksz = BLCKSZ;
--- 547,627 ----
  }


+
+
  /*
!  *  Restore the pg_control values by scanning old xlog files or by guessing it.
!  *
!  * Input parameter:
!  *    WAL:  Restore the pg_control values by scanning old xlog files.
!  *    GUESS: Restore the pg_control values by guessing.
!  * Return:
!  *    TRUE: success in restoring.
!  *    FALSE: fail to restore the values.
!  *
   */
! static bool
! RestoreControlValues(int mode)
  {
      struct timeval tv;
      char       *localeptr;
+     bool    successed=true;

      /*
       * Set up a completely default set of pg_control values.
       */
      memset(&ControlFile, 0, sizeof(ControlFile));

      ControlFile.pg_control_version = PG_CONTROL_VERSION;
      ControlFile.catalog_version_no = CATALOG_VERSION_NO;

!     /*
!      * update the checkpoint value in control file,by searching
!      * xlog segment file, or just guessing it.
       */
!      if (mode == WAL)
!      {
!         int result = SearchLastCheckpoint();
!         if ( result > 0 ) /* The last checkpoint had been found. */
!         {
!             ControlFile.checkPointCopy = lastcheckpoint;
!             ControlFile.checkPoint = lastchkp;
!             ControlFile.prevCheckPoint = prevchkp;
!             ControlFile.logId = LastXLogFile->logid;
!             ControlFile.logSeg = LastXLogFile->seg + 1;
!             ControlFile.checkPointCopy.ThisTimeLineID = LastXLogFile->tli;
!             ControlFile.state = state;
!         } else     successed = false;
!
!         /* Clean up the list. */
!         CleanUpList(xlogfilelist);
!
!      }
!
!     if (mode == GUESS)
!     {
!         ControlFile.checkPointCopy.redo.xlogid = 0;
!         ControlFile.checkPointCopy.redo.xrecoff = SizeOfXLogLongPHD;
!         ControlFile.checkPointCopy.undo = ControlFile.checkPointCopy.redo;
!         ControlFile.checkPointCopy.nextXid = (TransactionId) 514;    /* XXX */
!         ControlFile.checkPointCopy.nextOid = FirstBootstrapObjectId;
!         ControlFile.checkPointCopy.nextMulti = FirstMultiXactId;
!         ControlFile.checkPointCopy.nextMultiOffset = 0;
!         ControlFile.checkPointCopy.time = time(NULL);
!         ControlFile.checkPoint = ControlFile.checkPointCopy.redo;
!         /*
!          * Create a new unique installation identifier, since we can no longer
!          * use any old XLOG records.  See notes in xlog.c about the algorithm.
!          */
!         gettimeofday(&tv, NULL);
!         sysidentifier = ((uint64) tv.tv_sec) << 32;
!         sysidentifier |= (uint32) (tv.tv_sec | tv.tv_usec);
!         ControlFile.state = DB_SHUTDOWNED;
!
!     }

      ControlFile.time = time(NULL);
!     ControlFile.system_identifier = sysidentifier;
      ControlFile.maxAlign = MAXIMUM_ALIGNOF;
      ControlFile.floatFormat = FLOATFORMAT_VALUE;
      ControlFile.blcksz = BLCKSZ;
***************
*** 483,510 ****
      }
      StrNCpy(ControlFile.lc_ctype, localeptr, LOCALE_NAME_BUFLEN);

!     /*
!      * XXX eventually, should try to grovel through old XLOG to develop more
!      * accurate values for TimeLineID, nextXID, etc.
!      */
  }


  /*
!  * Print the guessed pg_control values when we had to guess.
   *
   * NB: this display should be just those fields that will not be
   * reset by RewriteControlFile().
   */
  static void
! PrintControlValues(bool guessed)
  {
      char        sysident_str[32];

!     if (guessed)
!         printf(_("Guessed pg_control values:\n\n"));
!     else
!         printf(_("pg_control values:\n\n"));

      /*
       * Format system_identifier separately to keep platform-dependent format
--- 652,673 ----
      }
      StrNCpy(ControlFile.lc_ctype, localeptr, LOCALE_NAME_BUFLEN);

!     return successed;
  }


  /*
!  * Print the out pg_control values.
   *
   * NB: this display should be just those fields that will not be
   * reset by RewriteControlFile().
   */
  static void
! PrintControlValues(void)
  {
      char        sysident_str[32];

!     printf(_("pg_control values:\n\n"));

      /*
       * Format system_identifier separately to keep platform-dependent format
***************
*** 538,553 ****
      printf(_("LC_CTYPE:                             %s\n"), ControlFile.lc_ctype);
  }

-
  /*
!  * Write out the new pg_control file.
!  */
! static void
! RewriteControlFile(void)
  {
-     int            fd;
-     char        buffer[PG_CONTROL_SIZE]; /* need not be aligned */
-
      /*
       * Adjust fields as needed to force an empty XLOG starting at the next
       * available segment.
--- 701,712 ----
      printf(_("LC_CTYPE:                             %s\n"), ControlFile.lc_ctype);
  }

  /*
! * Update the control file before reseting it.
! */
! static void
! UpdateCtlFile4Reset(void)
  {
      /*
       * Adjust fields as needed to force an empty XLOG starting at the next
       * available segment.
***************
*** 578,583 ****
--- 737,753 ----
      ControlFile.checkPoint = ControlFile.checkPointCopy.redo;
      ControlFile.prevCheckPoint.xlogid = 0;
      ControlFile.prevCheckPoint.xrecoff = 0;
+ }
+
+ /*
+  * Write out the new pg_control file.
+  */
+ static void
+ RewriteControlFile(void)
+ {
+     int            fd;
+     char        buffer[PG_CONTROL_SIZE]; /* need not be aligned */
+

      /* Contents are protected with a CRC */
      INIT_CRC32(ControlFile.crc);
***************
*** 672,678 ****
          errno = 0;
      }
  #ifdef WIN32
-
      /*
       * This fix is in mingw cvs (runtime/mingwex/dirent.c rev 1.4), but not in
       * released version
--- 842,847 ----
***************
*** 801,814 ****
      printf(_("%s resets the PostgreSQL transaction log.\n\n"), progname);
      printf(_("Usage:\n  %s [OPTION]... DATADIR\n\n"), progname);
      printf(_("Options:\n"));
!     printf(_("  -f              force update to be done\n"));
      printf(_("  -l TLI,FILE,SEG force minimum WAL starting location for new transaction log\n"));
!     printf(_("  -m XID          set next multitransaction ID\n"));
!     printf(_("  -n              no update, just show extracted control values (for testing)\n"));
      printf(_("  -o OID          set next OID\n"));
!     printf(_("  -O OFFSET       set next multitransaction offset\n"));
      printf(_("  -x XID          set next transaction ID\n"));
      printf(_("  --help          show this help, then exit\n"));
      printf(_("  --version       output version information, then exit\n"));
      printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));
  }
--- 970,1633 ----
      printf(_("%s resets the PostgreSQL transaction log.\n\n"), progname);
      printf(_("Usage:\n  %s [OPTION]... DATADIR\n\n"), progname);
      printf(_("Options:\n"));
!     printf(_("  -f              force reset xlog to be done, if the control file is corrupted, then try to restore
it.\n"));
!     printf(_("  -r              restore the pg_control file from old XLog files, resets is not done..\n"));
      printf(_("  -l TLI,FILE,SEG force minimum WAL starting location for new transaction log\n"));
!     printf(_("  -n              show extracted control values of existing pg_control file.\n"));
!     printf(_("  -m multiXID     set next multi transaction ID\n"));
      printf(_("  -o OID          set next OID\n"));
!     printf(_("  -O multiOffset  set next multi transaction offset\n"));
      printf(_("  -x XID          set next transaction ID\n"));
      printf(_("  --help          show this help, then exit\n"));
      printf(_("  --version       output version information, then exit\n"));
      printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));
  }
+
+
+
+ /*
+  * The following routines are mainly used for getting pg_control values
+  * from the xlog file.
+  */
+
+  /* some local varaibles.*/
+ static int              logFd=0; /* kernel FD for current input file */
+ static int              logRecOff;      /* offset of next record in page */
+ static char             pageBuffer[BLCKSZ];     /* current page */
+ static XLogRecPtr       curRecPtr;      /* logical address of current record */
+ static XLogRecPtr       prevRecPtr;     /* logical address of previous record */
+ static char             *readRecordBuf = NULL; /* ReadRecord result area */
+ static uint32           readRecordBufSize = 0;
+ static int32            logPageOff;     /* offset of current page in file */
+ static uint32           logId;          /* current log file id */
+ static uint32           logSeg;         /* current log file segment */
+ static uint32           logTli;         /* current log file timeline */
+
+ /*
+  * Get existing XLOG files
+  */
+ static void
+ GetXLogFiles(void)
+ {
+     DIR           *xldir;
+     struct dirent *xlde;
+
+     /* Open the xlog direcotry.*/
+     xldir = opendir(XLOGDIR);
+     if (xldir == NULL)
+     {
+         fprintf(stderr, _("%s: could not open directory \"%s\": %s\n"),
+                 progname, XLOGDIR, strerror(errno));
+         exit(1);
+     }
+
+     /* Search the directory, insert the segment files into the xlogfilelist.*/
+     errno = 0;
+     while ((xlde = readdir(xldir)) != NULL)
+     {
+         if (ValidXLogFileName(xlde->d_name)) {
+             /* XLog file is found, insert it into the xlogfilelist.*/
+             InsertXLogFile(xlde->d_name);
+         };
+         errno = 0;
+     }
+ #ifdef WIN32
+     if (GetLastError() == ERROR_NO_MORE_FILES)
+         errno = 0;
+ #endif
+
+     if (errno)
+     {
+         fprintf(stderr, _("%s: could not read from directory \"%s\": %s\n"),
+                 progname, XLOGDIR, strerror(errno));
+         exit(1);
+     }
+     closedir(xldir);
+ }
+
+ /*
+  * Insert a file while had been found in the xlog folder into xlogfilelist.
+  * The xlogfile list is matained in a increasing order.
+  *
+  * The input parameter is the name of the xlog  file, the name is assumpted
+  * valid.
+  */
+ static void
+ InsertXLogFile( char * fname )
+ {
+     XLogFileName * NewSegFile, *Curr, *Prev;
+     bool append2end = false;
+
+     /* Allocate a new node for the new file. */
+     NewSegFile = (XLogFileName *) malloc(sizeof(XLogFileName));
+     strcpy(NewSegFile->fname,fname); /* setup the name */
+     /* extract the time line, logid, and segment number from the name.*/
+     sscanf(fname, "%8x%8x%8x", &(NewSegFile->tli), &(NewSegFile->logid), &(NewSegFile->seg));
+     NewSegFile->next = NULL;
+
+     /* Ensure the xlog file is active and valid.*/
+     if (! ValidXLogFileHeader(NewSegFile))
+     {
+         free(NewSegFile);
+         return;
+     }
+
+     /* the list is empty.*/
+     if ( xlogfilelist == NULL ) {
+         xlogfilelist = NewSegFile;
+         return;
+     };
+
+     /* try to search the list and find the insert point. */
+     Prev=Curr=xlogfilelist;
+     while( CmpXLogFileOT(NewSegFile, Curr))
+     {
+         /* the node is appended to the end of the list.*/
+         if (Curr->next == NULL)
+         {
+             append2end = true;
+             break;
+         }
+         Prev=Curr;
+         Curr = Curr->next;
+     }
+
+     /* Insert the new node to the list.*/
+     if ( append2end )
+     {
+         /* We need to append the new node to the end of the list */
+         Curr->next = NewSegFile;
+     }
+     else
+     {
+         NewSegFile->next = Curr;
+         /* prev should not be the list head. */
+         if ( Prev != NULL && Prev != xlogfilelist)
+         {
+             Prev->next = NewSegFile;
+         }
+     }
+     /* Update the list head if it is needed.*/
+     if ((Curr == xlogfilelist) && !append2end)
+     {
+         xlogfilelist = NewSegFile;
+     }
+
+ }
+
+ /*
+  * compare two xlog file from their name to see which one is latest.
+  *
+  * Return true for file 2 is the lastest file.
+  *
+  */
+ static bool
+ CmpXLogFileOT(XLogFileName * f1, XLogFileName *f2)
+ {
+         if (f2->tli >= f1->tli)
+         {
+                 if (f2->logid >= f1->logid)
+                 {
+                         if (f2->seg > f1->seg) return false;
+                 }
+         }
+         return true;
+
+ }
+
+ /* check is two segment file is continous.*/
+ static bool
+ IsNextSeg(XLogFileName *prev, XLogFileName *cur)
+ {
+     uint32 logid, logseg;
+
+     if (prev->tli != cur->tli) return false;
+
+     logid = prev->logid;
+     logseg = prev->seg;
+     NextLogSeg(logid, logseg);
+
+     if ((logid == cur->logid) && (logseg == cur->seg)) return true;
+
+     return false;
+
+ }
+
+
+ /*
+ * Select the oldest xlog file in the latest time line.
+ */
+ static void
+ SelectStartXLog( void )
+ {
+     XLogFileName *tmp;
+     CurXLogFile = xlogfilelist;
+
+     if (xlogfilelist == NULL)
+     {
+         return;
+     }
+
+     tmp=LastXLogFile=CurXLogFile=xlogfilelist;
+
+     while(tmp->next != NULL)
+     {
+
+         /*
+          * we should ensure that from the first to
+          * the last segment file is continous.
+          * */
+         if (!IsNextSeg(tmp, tmp->next))
+         {
+             CurXLogFile = tmp->next;
+         }
+         tmp=tmp->next;
+     }
+
+     LastXLogFile = tmp;
+
+ }
+
+ /*
+  * Check if the file is a valid xlog file.
+  *
+  * Return true for the input file is a valid xlog file.
+  *
+  * The input parameter is the name of the xlog file.
+  *
+  */
+ static bool
+ ValidXLogFileName(char * fname)
+ {
+     uint logTLI, logId, logSeg;
+     if (strlen(fname) != 24 ||
+         strspn(fname, "0123456789ABCDEF") != 24 ||
+         sscanf(fname, "%8x%8x%8x", &logTLI, &logId, &logSeg) != 3)
+         return false;
+     return true;
+
+ }
+
+ /* Ensure the xlog file is active and valid.*/
+ static bool
+ ValidXLogFileHeader(XLogFileName *segfile)
+ {
+     int fd;
+     char buffer[BLCKSZ];
+     char        path[MAXPGPATH];
+     size_t nread;
+
+     snprintf(path, MAXPGPATH, "%s/%s", XLOGDIR, segfile->fname);
+     fd = open(path, O_RDONLY | PG_BINARY, 0);
+         if (fd < 0)
+     {
+         return false;
+     }
+     nread = read(fd, buffer, BLCKSZ);
+     if (nread == BLCKSZ)
+     {
+         XLogPageHeader hdr = (XLogPageHeader)buffer;
+
+         if (ValidXLOGPageHeader(hdr, segfile->tli, segfile->logid, segfile->seg))
+         {
+             return true;
+         }
+
+     }
+     return false;
+
+ }
+ static bool
+ ValidXLOGPageHeader(XLogPageHeader hdr, uint tli, uint id, uint seg)
+ {
+     XLogRecPtr    recaddr;
+
+     if (hdr->xlp_magic != XLOG_PAGE_MAGIC)
+     {
+         return false;
+     }
+     if ((hdr->xlp_info & ~XLP_ALL_FLAGS) != 0)
+     {
+         return false;
+     }
+     if (hdr->xlp_info & XLP_LONG_HEADER)
+     {
+         XLogLongPageHeader longhdr = (XLogLongPageHeader) hdr;
+
+         if (longhdr->xlp_seg_size != XLogSegSize)
+         {
+             return false;
+         }
+         /* Get the system identifier from the segment file header.*/
+         sysidentifier = ((XLogLongPageHeader) pageBuffer)->xlp_sysid;
+     }
+
+     recaddr.xlogid = id;
+     recaddr.xrecoff = seg * XLogSegSize + logPageOff;
+     if (!XLByteEQ(hdr->xlp_pageaddr, recaddr))
+     {
+         return false;
+     }
+
+     if (hdr->xlp_tli != tli)
+     {
+         return false;
+     }
+     return true;
+ }
+
+
+ /* Read another page, if possible */
+ static bool
+ ReadXLogPage(void)
+ {
+     size_t nread;
+
+     /* Need to advance to the new segment file.*/
+     if ( logPageOff >= XLogSegSize )
+     {
+         close(logFd);
+         logFd = 0;
+     }
+
+     /* Need to open the segement file.*/
+     if ((logFd <= 0) && (CurXLogFile != NULL))
+     {
+         if (OpenXLogFile(CurXLogFile) < 0)
+         {
+             return false;
+         }
+         CurXLogFile = CurXLogFile->next;
+     }
+
+     /* Read a page from the openning segement file.*/
+     nread = read(logFd, pageBuffer, BLCKSZ);
+
+     if (nread == BLCKSZ)
+     {
+         logPageOff += BLCKSZ;
+         if (ValidXLOGPageHeader( (XLogPageHeader)pageBuffer, logTli, logId, logSeg))
+             return true;
+     }
+
+     return false;
+ }
+
+ /*
+  * CRC-check an XLOG record.  We do not believe the contents of an XLOG
+  * record (other than to the minimal extent of computing the amount of
+  * data to read in) until we've checked the CRCs.
+  *
+  * We assume all of the record has been read into memory at *record.
+  */
+ static bool
+ RecordIsValid(XLogRecord *record, XLogRecPtr recptr)
+ {
+     pg_crc32    crc;
+     int            i;
+     uint32        len = record->xl_len;
+     BkpBlock    bkpb;
+     char       *blk;
+
+     /* First the rmgr data */
+     INIT_CRC32(crc);
+     COMP_CRC32(crc, XLogRecGetData(record), len);
+
+     /* Add in the backup blocks, if any */
+     blk = (char *) XLogRecGetData(record) + len;
+     for (i = 0; i < XLR_MAX_BKP_BLOCKS; i++)
+     {
+         uint32    blen;
+
+         if (!(record->xl_info & XLR_SET_BKP_BLOCK(i)))
+             continue;
+
+         memcpy(&bkpb, blk, sizeof(BkpBlock));
+         if (bkpb.hole_offset + bkpb.hole_length > BLCKSZ)
+         {
+             return false;
+         }
+         blen = sizeof(BkpBlock) + BLCKSZ - bkpb.hole_length;
+         COMP_CRC32(crc, blk, blen);
+         blk += blen;
+     }
+
+     /* Check that xl_tot_len agrees with our calculation */
+     if (blk != (char *) record + record->xl_tot_len)
+     {
+         return false;
+     }
+
+     /* Finally include the record header */
+     COMP_CRC32(crc, (char *) record + sizeof(pg_crc32),
+                SizeOfXLogRecord - sizeof(pg_crc32));
+     FIN_CRC32(crc);
+
+     if (!EQ_CRC32(record->xl_crc, crc))
+     {
+         return false;
+     }
+
+     return true;
+ }
+
+
+
+ /*
+  * Attempt to read an XLOG record into readRecordBuf.
+  */
+ static bool
+ FetchRecord(void)
+ {
+     char       *buffer;
+     XLogRecord *record;
+     XLogContRecord *contrecord;
+     uint32        len, total_len;
+
+
+     while (logRecOff <= 0 || logRecOff > BLCKSZ - SizeOfXLogRecord)
+     {
+         /* Need to advance to new page */
+         if (! ReadXLogPage())
+         {
+             return false;
+         }
+
+         logRecOff = XLogPageHeaderSize((XLogPageHeader) pageBuffer);
+         if ((((XLogPageHeader) pageBuffer)->xlp_info & ~XLP_LONG_HEADER) != 0)
+         {
+             /* Check for a continuation record */
+             if (((XLogPageHeader) pageBuffer)->xlp_info & XLP_FIRST_IS_CONTRECORD)
+             {
+                 contrecord = (XLogContRecord *) (pageBuffer + logRecOff);
+                 logRecOff += MAXALIGN(contrecord->xl_rem_len + SizeOfXLogContRecord);
+             }
+         }
+     }
+
+     curRecPtr.xlogid = logId;
+     curRecPtr.xrecoff = logSeg * XLogSegSize + logPageOff + logRecOff;
+     record = (XLogRecord *) (pageBuffer + logRecOff);
+
+     if (record->xl_len == 0)
+     {
+         return false;
+     }
+
+     total_len = record->xl_tot_len;
+
+     /*
+      * Allocate or enlarge readRecordBuf as needed.  To avoid useless
+      * small increases, round its size to a multiple of BLCKSZ, and make
+      * sure it's at least 4*BLCKSZ to start with.  (That is enough for all
+      * "normal" records, but very large commit or abort records might need
+      * more space.)
+      */
+     if (total_len > readRecordBufSize)
+     {
+         uint32        newSize = total_len;
+
+         newSize += BLCKSZ - (newSize % BLCKSZ);
+         newSize = Max(newSize, 4 * BLCKSZ);
+         if (readRecordBuf)
+             free(readRecordBuf);
+         readRecordBuf = (char *) malloc(newSize);
+         if (!readRecordBuf)
+         {
+             readRecordBufSize = 0;
+             return false;
+         }
+         readRecordBufSize = newSize;
+     }
+
+     buffer = readRecordBuf;
+     len = BLCKSZ - curRecPtr.xrecoff % BLCKSZ; /* available in block */
+     if (total_len > len)
+     {
+         /* Need to reassemble record */
+         uint32            gotlen = len;
+
+         memcpy(buffer, record, len);
+         record = (XLogRecord *) buffer;
+         buffer += len;
+         for (;;)
+         {
+             uint32    pageHeaderSize;
+
+             if (!ReadXLogPage())
+             {
+                 return false;
+             }
+             if (!(((XLogPageHeader) pageBuffer)->xlp_info & XLP_FIRST_IS_CONTRECORD))
+             {
+                 return false;
+             }
+             pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) pageBuffer);
+             contrecord = (XLogContRecord *) (pageBuffer + pageHeaderSize);
+             if (contrecord->xl_rem_len == 0 ||
+                 total_len != (contrecord->xl_rem_len + gotlen))
+             {
+                 return false;
+             }
+             len = BLCKSZ - pageHeaderSize - SizeOfXLogContRecord;
+             if (contrecord->xl_rem_len > len)
+             {
+                 memcpy(buffer, (char *)contrecord + SizeOfXLogContRecord, len);
+                 gotlen += len;
+                 buffer += len;
+                 continue;
+             }
+             memcpy(buffer, (char *) contrecord + SizeOfXLogContRecord,
+                    contrecord->xl_rem_len);
+             logRecOff = MAXALIGN(pageHeaderSize + SizeOfXLogContRecord + contrecord->xl_rem_len);
+             break;
+         }
+         if (!RecordIsValid(record, curRecPtr))
+         {
+             return false;
+         }
+         return true;
+     }
+     /* Record is contained in this page */
+     memcpy(buffer, record, total_len);
+     record = (XLogRecord *) buffer;
+     logRecOff += MAXALIGN(total_len);
+     if (!RecordIsValid(record, curRecPtr))
+     {
+
+         return false;
+     }
+     return true;
+ }
+
+ /*
+  * if the record is checkpoint, update the lastest checkpoint record.
+  */
+ static void
+ UpdateCheckPoint(XLogRecord *record)
+ {
+     uint8    info = record->xl_info & ~XLR_INFO_MASK;
+
+     if ((info == XLOG_CHECKPOINT_SHUTDOWN) ||
+         (info == XLOG_CHECKPOINT_ONLINE))
+     {
+          CheckPoint *chkpoint = (CheckPoint*) XLogRecGetData(record);
+          prevchkp = lastchkp;
+          lastchkp = curRecPtr;
+          lastcheckpoint = *chkpoint;
+
+          /* update the database state.*/
+          switch(info)
+          {
+             case XLOG_CHECKPOINT_SHUTDOWN:
+                 state = DB_SHUTDOWNED;
+                 break;
+             case XLOG_CHECKPOINT_ONLINE:
+                 state = DB_IN_PRODUCTION;
+                 break;
+          }
+          found_checkpoint ++ ;
+     }
+ }
+
+ static int
+ OpenXLogFile(XLogFileName *sf)
+ {
+
+     char        path[MAXPGPATH];
+
+     if ( logFd > 0 ) close(logFd);
+
+     /* Open a  Xlog segment file. */
+     snprintf(path, MAXPGPATH, "%s/%s", XLOGDIR, sf->fname);
+     logFd = open(path, O_RDONLY | PG_BINARY, 0);
+
+     if (logFd < 0)
+     {
+         fprintf(stderr, _("%s: Can not open xlog file %s.\n"), progname,path);
+         return -1;
+     }
+
+     /* Setup the parameter for searching. */
+     logPageOff = -BLCKSZ;        /* so 1st increment in readXLogPage gives 0 */
+     logRecOff = 0;
+     logId = sf->logid;
+     logSeg = sf->seg;
+     logTli = sf->tli;
+     return logFd;
+ }
+
+ /*
+  * Search the lastest checkpoint in the lastest XLog segment file.
+  *
+  * The return value is the total checkpoints which had been found
+  * in the XLog segment file.
+  */
+ static int
+ SearchLastCheckpoint(void)
+ {
+
+     /* retrive all of the active xlog files from xlog direcotry
+      * into a list by increasing order, according their timeline,
+      * log id, segment id.
+     */
+     GetXLogFiles();
+
+     /* Select the oldest segment file in the lastest time line.*/
+     SelectStartXLog();
+
+     /* No segment file was found.*/
+     if ( CurXLogFile == NULL )
+     {
+         return 0;
+     }
+
+     /* initial it . */
+     logFd=logId=logSeg=logTli=0;
+
+     /*
+      * Search the XLog segment file from beginning to end,
+      * if checkpoint record is found, then update the
+      * latest check point.
+      */
+     while (FetchRecord())
+     {
+         /* To see if the record is checkpoint record. */
+         if (((XLogRecord *) readRecordBuf)->xl_rmid == RM_XLOG_ID)
+             UpdateCheckPoint((XLogRecord *) readRecordBuf);
+         prevRecPtr = curRecPtr;
+     }
+
+     /* We can not know clearly if we had reached the end.
+      * But just check if we reach the last segment file,
+      * if it is not, then some problem there.
+      * (We need a better way to know the abnormal broken during the search)
+      */
+     if ((logId != LastXLogFile->logid) && (logSeg != LastXLogFile->seg))
+     {
+         return 0;
+     }
+
+     /*
+      * return the checkpoints which had been found yet,
+      * let others know how much checkpointes are found.
+      */
+     return found_checkpoint;
+ }
+
+ /* Clean up the allocated list.*/
+ static void
+ CleanUpList(XLogFileName *list)
+ {
+
+     XLogFileName *tmp;
+     tmp = list;
+     while(list != NULL)
+     {
+         tmp=list->next;
+         free(list);
+         list=tmp;
+     }
+
+ }
+

pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: PL/PGSQL: Dynamic Record Introspection
Next
From: Dhanaraj M
Date:
Subject: Patch for BUG #2073: Can't drop sequence when created via SERIAL column