Re: Add checkpoint and redo LSN to LogCheckpointEnd log message - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date
Msg-id 20220207.120258.310426179780547983.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Add checkpoint and redo LSN to LogCheckpointEnd log message  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
List pgsql-hackers
At Mon, 07 Feb 2022 10:16:34 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Fri, 4 Feb 2022 10:59:04 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> > On 2022/02/03 15:50, Kyotaro Horiguchi wrote:
> > > By the way, restart point should start only while recoverying, and at
> > > the timeof the start both checkpoint.redo and checkpoint LSN are
> > > already past. We shouldn't update minRecovery point after promotion,
> > > but is there any reason for not updating the checkPoint and
> > > checkPointCopy?  If we update them after promotion, the
> > > which-LSN-to-show problem would be gone.
> > 
> > I tried to find the reason by reading the past discussion, but have
> > not found that yet.
> > 
> > If we update checkpoint and REDO LSN at pg_control in that case, we
> > also need to update min recovery point at pg_control? Otherwise the
> > min recovery point at pg_control still indicates the old LSN that
> > previous restart point set.
> 
> I had an assuption that the reason I think it shouldn't update
> minRecoveryPoint is that it has been or is going to be reset to
> invalid LSN by promotion and the checkpoint should refrain from
> touching it.

Hmm.. It doesn't seem to be the case. If a server crashes just after
promotion and before requesting post-promtion checkpoint,
minRecoveryPoint stays at a valid LSN.

(Promoted at 0/7000028)
Database cluster state:               in production
Latest checkpoint location:           0/6000060
Latest checkpoint's REDO location:    0/6000028
Latest checkpoint's REDO WAL file:    000000010000000000000006
Minimum recovery ending location:     0/7000090
Min recovery ending loc's timeline:   2

minRecoveryPoint/TLI are ignored in any case where a server in
in-production state is started.  In other words, the values are
useless.  There's no clear or written reason for unrecording the last
ongoing restartpoint after promotion.

Before fast-promotion was introduced, we shouldn't get there after
end-of-recovery checkpoint (but somehow reached sometimes?) but it is
quite normal nowadays.  Or to the contrary, we're expecting it to
happen and it is regarded as a normal checkponit. So we should do
there nowadays are as the follows.

- If any later checkpoint/restartpoint has been established, just skip
  remaining task then return false. (!chkpt_was_latest)
  (I'm not sure this can happen, though.)

- we update control file only when archive recovery is still ongoing.

- Otherwise reset minRecoveryPoint then continue.

Do you have any thoughts or opinions?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 958220c495..ab8a4d9a1b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9658,6 +9658,7 @@ CreateRestartPoint(int flags)
     XLogRecPtr    endptr;
     XLogSegNo    _logSegNo;
     TimestampTz xtime;
+    bool        chkpt_was_latest = false;
 
     /* Get a local copy of the last safe checkpoint record. */
     SpinLockAcquire(&XLogCtl->info_lck);
@@ -9752,44 +9753,69 @@ CreateRestartPoint(int flags)
     PriorRedoPtr = ControlFile->checkPointCopy.redo;
 
     /*
-     * Update pg_control, using current time.  Check that it still shows
-     * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing;
-     * this is a quick hack to make sure nothing really bad happens if somehow
-     * we get here after the end-of-recovery checkpoint.
+     * Update pg_control, using current time if no later checkpoints have been
+     * performed.
      */
     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
-    if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
-        ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
+    if (ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
     {
+        chkpt_was_latest = true;
         ControlFile->checkPoint = lastCheckPointRecPtr;
         ControlFile->checkPointCopy = lastCheckPoint;
 
         /*
-         * Ensure minRecoveryPoint is past the checkpoint record.  Normally,
-         * this will have happened already while writing out dirty buffers,
-         * but not necessarily - e.g. because no buffers were dirtied.  We do
-         * this because a non-exclusive base backup uses minRecoveryPoint to
-         * determine which WAL files must be included in the backup, and the
-         * file (or files) containing the checkpoint record must be included,
-         * at a minimum. Note that for an ordinary restart of recovery there's
-         * no value in having the minimum recovery point any earlier than this
+         * Ensure minRecoveryPoint is past the checkpoint record while archive
+         * recovery is still ongoing.  Normally, this will have happened
+         * already while writing out dirty buffers, but not necessarily -
+         * e.g. because no buffers were dirtied.  We do this because a
+         * non-exclusive base backup uses minRecoveryPoint to determine which
+         * WAL files must be included in the backup, and the file (or files)
+         * containing the checkpoint record must be included, at a
+         * minimum. Note that for an ordinary restart of recovery there's no
+         * value in having the minimum recovery point any earlier than this
          * anyway, because redo will begin just after the checkpoint record.
          */
-        if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
+        if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY)
         {
-            ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
-            ControlFile->minRecoveryPointTLI = lastCheckPoint.ThisTimeLineID;
+            if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
+            {
+                ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
+                ControlFile->minRecoveryPointTLI = lastCheckPoint.ThisTimeLineID;
 
-            /* update local copy */
-            minRecoveryPoint = ControlFile->minRecoveryPoint;
-            minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+                /* update local copy */
+                minRecoveryPoint = ControlFile->minRecoveryPoint;
+                minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+            }
+            if (flags & CHECKPOINT_IS_SHUTDOWN)
+                ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
+        }
+        else
+        {
+            /*
+             * We have exited from archive-recovery mode after starting. Crash
+             * recovery ever after should always recover to the end of WAL
+             */
+            ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
+            ControlFile->minRecoveryPointTLI = 0;
         }
-        if (flags & CHECKPOINT_IS_SHUTDOWN)
-            ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
         UpdateControlFile();
     }
     LWLockRelease(ControlFileLock);
 
+    /*
+     * Skip all post-checkpoint work if others have done that with later
+     * checkpoints.
+     */
+    if (!chkpt_was_latest)
+    {
+        ereport((log_checkpoints ? LOG : DEBUG2),
+                (errmsg("post-restartpoint cleanup is skpped at %X/%X, because later restartpoints have been already
performed",
+                        LSN_FORMAT_ARGS(lastCheckPoint.redo))));
+
+        /* this checkpoint has not bee established */
+        return false;
+    }
+
     /*
      * Update the average distance between checkpoints/restartpoints if the
      * prior checkpoint exists.

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: pg_receivewal - couple of improvements
Next
From: Amit Kapila
Date:
Subject: Re: [BUG]Update Toast data failure in logical replication