Re: Add checkpoint and redo LSN to LogCheckpointEnd log message - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date
Msg-id 20220214.144022.2233054833762514893.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Add checkpoint and redo LSN to LogCheckpointEnd log message  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: Add checkpoint and redo LSN to LogCheckpointEnd log message  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Re: Add checkpoint and redo LSN to LogCheckpointEnd log message  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
At Fri, 11 Feb 2022 01:00:03 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> 
> 
> On 2022/02/09 11:52, Kyotaro Horiguchi wrote:
> > 0001: The fix of CreateRestartPoint
> 
> This patch might be ok for the master branch. But since concurrent
> checkpoint and restartpoint can happen in v14 or before, we would need
> another patch based on that assumption, for the backport. How about
> fixing the bug all the branches at first, then apply this patch in the
> master to improve the code?

For backbranches, the attached for pg14 does part of the full patch.
Of the following, I think we should do (a) and (b) to make future
backpatchings easier.

a) Use RedoRecPtr and PriorRedoPtr after they are assigned.

b) Move assignment to PriorRedoPtr into the ControlFileLock section.

c) Skip udpate of minRecoveryPoint only when the checkpoint gets old.

d) Skip call to UpdateCheckPointDistanceEstimate() when RedoRecPtr <=
  PriorRedoPtr.

# Mmm. The v9-0001  contains a silly mistake here..

Still I'm not sure whether that case really happens and how checkpoint
behaves *after* that happenes, but at least it protects database from
the possible unrecoverable state due to the known issue here..

It doesn't apply even on pg13 (due to LSN_FORMAT_ARGS).  I will make
the per-version patches if you are fine with this.

regards.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From df516b487617c570332ea076b91773b810629a11 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Mon, 14 Feb 2022 13:04:33 +0900
Subject: [PATCH v2] Correctly update contfol file at the end of archive
 recovery

CreateRestartPoint runs WAL file cleanup basing on the checkpoint just
have finished in the function.  If the database has exited
DB_IN_ARCHIVE_RECOVERY state when the function is going to update
control file, the function refrains from updating the file at all then
proceeds to WAL cleanup having the latest REDO LSN, which is now
inconsistent with the control file.  As the result, the succeeding
cleanup procedure overly removes WAL files against the control file
and leaves unrecoverable database until the next checkpoint finishes.

Since all buffers have flushed out, we may safely regard that restart
point established regardless of the recovery state.  Thus we may and
should update checkpoint LSNs of checkpoint file in that case.  Still
we update minRecoveryPoint only during archive recovery but explicitly
clear if we have exited recovery.

Addition to that fix, this commit makes some cosmetic changes that
consist with the changes we are going to make on the master branch.
---
 src/backend/access/transam/xlog.c | 81 ++++++++++++++++++++-----------
 1 file changed, 53 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6208e123e5..28c3c4b7cf 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9653,7 +9653,7 @@ CreateRestartPoint(int flags)
 
     /* Also update the info_lck-protected copy */
     SpinLockAcquire(&XLogCtl->info_lck);
-    XLogCtl->RedoRecPtr = lastCheckPoint.redo;
+    XLogCtl->RedoRecPtr = RedoRecPtr;
     SpinLockRelease(&XLogCtl->info_lck);
 
     /*
@@ -9672,7 +9672,10 @@ CreateRestartPoint(int flags)
     /* Update the process title */
     update_checkpoint_display(flags, true, false);
 
-    CheckPointGuts(lastCheckPoint.redo, flags);
+    CheckPointGuts(RedoRecPtr, flags);
+
+    /* Update pg_control */
+    LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 
     /*
      * Remember the prior checkpoint's redo ptr for
@@ -9681,52 +9684,74 @@ CreateRestartPoint(int flags)
     PriorRedoPtr = ControlFile->checkPointCopy.redo;
 
     /*
-     * Update pg_control, using current time.  Check that it still shows
-     * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing;
-     * this is a quick hack to make sure nothing really bad happens if somehow
-     * we get here after the end-of-recovery checkpoint.
+     * Update pg_control, using current time if no later checkpoints have been
+     * performed.
      */
-    LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
-    if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
-        ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
+    if (PriorRedoPtr < RedoRecPtr)
     {
         ControlFile->checkPoint = lastCheckPointRecPtr;
         ControlFile->checkPointCopy = lastCheckPoint;
         ControlFile->time = (pg_time_t) time(NULL);
 
         /*
-         * Ensure minRecoveryPoint is past the checkpoint record.  Normally,
-         * this will have happened already while writing out dirty buffers,
-         * but not necessarily - e.g. because no buffers were dirtied.  We do
-         * this because a non-exclusive base backup uses minRecoveryPoint to
-         * determine which WAL files must be included in the backup, and the
-         * file (or files) containing the checkpoint record must be included,
-         * at a minimum. Note that for an ordinary restart of recovery there's
-         * no value in having the minimum recovery point any earlier than this
+         * Ensure minRecoveryPoint is past the checkpoint record while archive
+         * recovery is still ongoing.  Normally, this will have happened
+         * already while writing out dirty buffers, but not necessarily -
+         * e.g. because no buffers were dirtied.  We do this because a
+         * non-exclusive base backup uses minRecoveryPoint to determine which
+         * WAL files must be included in the backup, and the file (or files)
+         * containing the checkpoint record must be included, at a
+         * minimum. Note that for an ordinary restart of recovery there's no
+         * value in having the minimum recovery point any earlier than this
          * anyway, because redo will begin just after the checkpoint record.
+         * This is a quick hack to make sure nothing really bad happens if
+         * somehow we get here after the end-of-recovery checkpoint.
          */
-        if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
+        if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY)
         {
-            ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
-            ControlFile->minRecoveryPointTLI = lastCheckPoint.ThisTimeLineID;
+            if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
+            {
+                ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
+                ControlFile->minRecoveryPointTLI = lastCheckPoint.ThisTimeLineID;
 
-            /* update local copy */
-            minRecoveryPoint = ControlFile->minRecoveryPoint;
-            minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+                /* update local copy */
+                minRecoveryPoint = ControlFile->minRecoveryPoint;
+                minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+            }
+            if (flags & CHECKPOINT_IS_SHUTDOWN)
+                ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
         }
-        if (flags & CHECKPOINT_IS_SHUTDOWN)
-            ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
+        else
+        {
+            /*
+             * Aarchive recovery has ended. Crash recovery ever after should
+             * always recover to the end of WAL
+             */
+            ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
+            ControlFile->minRecoveryPointTLI = 0;
+        }
+
         UpdateControlFile();
     }
     LWLockRelease(ControlFileLock);
 
     /*
      * Update the average distance between checkpoints/restartpoints if the
-     * prior checkpoint exists.
+     * prior checkpoint exists and we have advanced REDO LSN.
      */
-    if (PriorRedoPtr != InvalidXLogRecPtr)
+    if (PriorRedoPtr != InvalidXLogRecPtr && RedoRecPtr > PriorRedoPtr)
         UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
 
+    /*
+     * Now we can safely regard this restart point as established.
+     *
+     * Since all buffers have been flushed so we regard this restart point as
+     * established.  We could omit some of the following steps in the case
+     * where we have omitted control file updates, but we don't bother avoid
+     * them from performing since that case rarely happens and they don't harm
+     * even if they take place.
+     */
+
     /*
      * Delete old log files, those no longer needed for last restartpoint to
      * prevent the disk holding the xlog from growing full.
@@ -9804,7 +9829,7 @@ CreateRestartPoint(int flags)
     xtime = GetLatestXTime();
     ereport((log_checkpoints ? LOG : DEBUG2),
             (errmsg("recovery restart point at %X/%X",
-                    LSN_FORMAT_ARGS(lastCheckPoint.redo)),
+                    LSN_FORMAT_ARGS(RedoRecPtr)),
              xtime ? errdetail("Last completed transaction was at log time %s.",
                                timestamptz_to_str(xtime)) : 0));
 
-- 
2.27.0


pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: How did queensnake corrupt zic.o?
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message