Re: PANIC during crash recovery of a recently promoted standby - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: PANIC during crash recovery of a recently promoted standby
Date
Msg-id 20180702.162513.154695753.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: PANIC during crash recovery of a recently promoted standby  (Michael Paquier <michael@paquier.xyz>)
Responses Re: PANIC during crash recovery of a recently promoted standby  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Hello.

At Fri, 22 Jun 2018 15:25:48 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180622062548.GE5215@paquier.xyz>
> On Fri, Jun 22, 2018 at 02:34:02PM +0900, Kyotaro HORIGUCHI wrote:
> > Hello, sorry for the absense and I looked the second patch.
> 
> Thanks for the review!
> 
> > At Fri, 22 Jun 2018 13:45:21 +0900, Michael Paquier
> > <michael@paquier.xyz> wrote in <20180622044521.GC5215@paquier.xyz>
> >> long as crash recovery runs.  And XLogNeedsFlush() also has a similar
> >> problem.
> > 
> > Here, on the other hand, this patch turns off
> > updateMinRecoverypoint if minRecoverPoint is invalid when
> > RecoveryInProgress() == true. Howerver RecovInProg() == true
> > means archive recovery is already started and and
> > minRecoveryPoint *should* be updated t for the
> > condition. Actually minRecoverypoint is updated just below. If
> > this is really right thing, I think that some explanation for the
> > reason is required here.
> 
> LocalRecoveryInProgress is just a local copy of SharedRecoveryInProgress
> so RecoveryInProgress also returns true if crash recovery is running.
> But perhaps I am missing what you mean?  The point here is that redo can
> call XLogNeedsFlush, no?

My concern at the time was the necessity to turn off
updateMinRecoveryPoint on the fly. (The previous comment seems a
bit confused, sorry.)

When minRecoveryPoint is invalid, there're only two possible
cases. It may be at very beginning of archive reovery or may be
running a crash recovery. In the latter case, we have detected
crash recovery before redo starts. So we can turn off
updateMinRecoveryPoint immediately and no further check is
needed and it is (I think) easier to understand.

> > In xlog_redo there still be "minRecoverypoint != 0", which ought
> > to be described as "!XLogRecPtrIsInvalid(minRecoveryPoint)". (It
> > seems the only one. Double negation is a bit uneasy but there are
> > many instance of this kind of coding.)
> 
> It is possible to use directly a comparison with InvalidXLogRecPtr
> instead of a double negation.

I'm not sure whether it is abstraction of invalid value, or just
a short cut of the value. That's right if it's the
latter. (There's several places where invalid LSN is assumed to
be smaller than any valid values in the patch).

the second diff is the difference of the first patch from
promote_panic_master.diff

On further thought, as we confirmed upthread (and existing
comments are saying) that (minRecoveryPoint == 0)
!InArchiveRecovery are always equivalent, and
updateMinRecoveryPoint becomes equivalent to them by the v3
patch.  That is, we can just remove the variable and the attached
third patch is that. It also passes all recovery tests including
the new 015.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dcfef36591..d86137ae8b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -821,8 +821,14 @@ static XLogSource XLogReceiptSource = 0;    /* XLOG_FROM_* code */
 static XLogRecPtr ReadRecPtr;    /* start of last record read */
 static XLogRecPtr EndRecPtr;    /* end+1 of last record read */
 
-static XLogRecPtr minRecoveryPoint; /* local copy of
-                                     * ControlFile->minRecoveryPoint */
+/*
+ * Local copies of equivalent fields in the control file.  When running
+ * crash recovery, minRecoveryPoint is set to InvalidXLogRecPtr as we
+ * expect to replay all the WAL available, and updateMinRecoveryPoint is
+ * switched to false to prevent any updates while replaying records.
+ * Those values are kept consistent as long as crash recovery runs.
+ */
+static XLogRecPtr minRecoveryPoint;
 static TimeLineID minRecoveryPointTLI;
 static bool updateMinRecoveryPoint = true;
 
@@ -2707,7 +2713,7 @@ XLogGetReplicationSlotMinimumLSN(void)
 static void
 UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 {
-    /* Quick check using our local copy of the variable */
+    /* Check using our local copy of minRecoveryPoint */
     if (!updateMinRecoveryPoint || (!force && lsn <= minRecoveryPoint))
         return;
 
@@ -2717,14 +2723,7 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
     minRecoveryPoint = ControlFile->minRecoveryPoint;
     minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
-    /*
-     * An invalid minRecoveryPoint means that we need to recover all the WAL,
-     * i.e., we're doing crash recovery.  We never modify the control file's
-     * value in that case, so we can short-circuit future checks here too.
-     */
-    if (minRecoveryPoint == 0)
-        updateMinRecoveryPoint = false;
-    else if (force || minRecoveryPoint < lsn)
+    if (force || minRecoveryPoint < lsn)
     {
         XLogRecPtr    newMinRecoveryPoint;
         TimeLineID    newMinRecoveryPointTLI;
@@ -3110,7 +3109,7 @@ XLogNeedsFlush(XLogRecPtr record)
      */
     if (RecoveryInProgress())
     {
-        /* Quick exit if already known updated */
+        /* Quick exit if already known to be updated or cannot be updated */
         if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
             return false;
 
@@ -3124,20 +3123,8 @@ XLogNeedsFlush(XLogRecPtr record)
         minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
         LWLockRelease(ControlFileLock);
 
-        /*
-         * An invalid minRecoveryPoint means that we need to recover all the
-         * WAL, i.e., we're doing crash recovery.  We never modify the control
-         * file's value in that case, so we can short-circuit future checks
-         * here too.
-         */
-        if (minRecoveryPoint == 0)
-            updateMinRecoveryPoint = false;
-
         /* check again */
-        if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
-            return false;
-        else
-            return true;
+        return record > minRecoveryPoint;
     }
 
     /* Quick exit if already known flushed */
@@ -4269,6 +4256,12 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
                 minRecoveryPoint = ControlFile->minRecoveryPoint;
                 minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
+                /*
+                 * The startup process can update its local copy of
+                 * minRecoveryPoint from this point.
+                 */
+                updateMinRecoveryPoint = true;
+
                 UpdateControlFile();
                 LWLockRelease(ControlFileLock);
 
@@ -6892,9 +6885,31 @@ StartupXLOG(void)
         /* No need to hold ControlFileLock yet, we aren't up far enough */
         UpdateControlFile();
 
-        /* initialize our local copy of minRecoveryPoint */
-        minRecoveryPoint = ControlFile->minRecoveryPoint;
-        minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+        /*
+         * Initialize our local copy of minRecoveryPoint.  When doing crash
+         * recovery we want to replay up to the end of WAL.  Particularly, in
+         * the case of a promoted standby minRecoveryPoint value in the
+         * control file is only updated after the first checkpoint.  However,
+         * if the instance crashes before the first post-recovery checkpoint
+         * is completed then recovery will use a stale location causing the
+         * startup process to think that there are still invalid page
+         * references when checking for data consistency.
+         */
+        if (InArchiveRecovery)
+        {
+            minRecoveryPoint = ControlFile->minRecoveryPoint;
+            minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+        }
+        else
+        {
+            /*
+             * We are to run crash recovery. We shouldn't update
+             * minRecoveryPoint until crash recvoery ends.
+             */
+            updateMinRecoveryPoint = false;
+            minRecoveryPoint = InvalidXLogRecPtr;
+            minRecoveryPointTLI = 0;
+        }
 
         /*
          * Reset pgstat data, because it may be invalid after recovery.
@@ -7861,6 +7876,8 @@ CheckRecoveryConsistency(void)
     if (XLogRecPtrIsInvalid(minRecoveryPoint))
         return;
 
+    Assert(InArchiveRecovery);
+
     /*
      * assume that we are called in the startup process, and hence don't need
      * a lock to read lastReplayedEndRecPtr
@@ -9949,11 +9966,16 @@ xlog_redo(XLogReaderState *record)
          * Update minRecoveryPoint to ensure that if recovery is aborted, we
          * recover back up to this point before allowing hot standby again.
          * This is important if the max_* settings are decreased, to ensure
-         * you don't run queries against the WAL preceding the change.
+         * you don't run queries against the WAL preceding the change. The
+         * local copies cannot be updated as long as crash recovery is
+         * happening and we expect all the WAL to be replayed.
          */
-        minRecoveryPoint = ControlFile->minRecoveryPoint;
-        minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
-        if (minRecoveryPoint != 0 && minRecoveryPoint < lsn)
+        if (InArchiveRecovery)
+        {
+            minRecoveryPoint = ControlFile->minRecoveryPoint;
+            minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+        }
+        if (minRecoveryPoint != InvalidXLogRecPtr && minRecoveryPoint < lsn)
         {
             ControlFile->minRecoveryPoint = lsn;
             ControlFile->minRecoveryPointTLI = ThisTimeLineID;
2716c2716
<     /* Quick check using our local copy of the variable */
---
>     /* Check using our local copy of minRecoveryPoint */
2720,2732d2719
<     /*
<      * An invalid minRecoveryPoint means that we need to recover all the WAL,
<      * i.e., we're doing crash recovery.  We never modify the control file's
<      * value in that case, so we can short-circuit future checks here too. The
<      * local values of minRecoveryPoint and minRecoveryPointTLI should not be
<      * updated until crash recovery finishes.
<      */
<     if (XLogRecPtrIsInvalid(minRecoveryPoint))
<     {
<         updateMinRecoveryPoint = false;
<         return;
<     }
< 
3125,3133d3111
<         /*
<          * An invalid minRecoveryPoint means that we need to recover all the
<          * WAL, i.e., we're doing crash recovery.  We never modify the control
<          * file's value in that case, so we can short-circuit future checks
<          * here too.
<          */
<         if (XLogRecPtrIsInvalid(minRecoveryPoint))
<             updateMinRecoveryPoint = false;
< 
6926a6905,6909
>             /*
>              * We are to run crash recovery. We shouldn't update
>              * minRecoveryPoint until crash recvoery ends.
>              */
>             updateMinRecoveryPoint = false;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dcfef36591..8a37784653 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -821,10 +821,13 @@ static XLogSource XLogReceiptSource = 0;    /* XLOG_FROM_* code */
 static XLogRecPtr ReadRecPtr;    /* start of last record read */
 static XLogRecPtr EndRecPtr;    /* end+1 of last record read */
 
-static XLogRecPtr minRecoveryPoint; /* local copy of
-                                     * ControlFile->minRecoveryPoint */
+/*
+ * Local copies of equivalent fields in the control file.  When running
+ * crash recovery, minRecoveryPoint is set to InvalidXLogRecPtr as we
+ * expect to replay all the WAL available.
+ */
+static XLogRecPtr minRecoveryPoint;
 static TimeLineID minRecoveryPointTLI;
-static bool updateMinRecoveryPoint = true;
 
 /*
  * Have we reached a consistent database state? In crash recovery, we have
@@ -2707,8 +2710,13 @@ XLogGetReplicationSlotMinimumLSN(void)
 static void
 UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 {
-    /* Quick check using our local copy of the variable */
-    if (!updateMinRecoveryPoint || (!force && lsn <= minRecoveryPoint))
+    /*
+     * Quich check using our local copy of minRecoveryPoint. Invalid
+     * minRecoveryPoint value means we are running crash recovery and don't
+     * need to update it.
+     */
+    if (XLogRecPtrIsInvalid(minRecoveryPoint) ||
+        (!force && lsn <= minRecoveryPoint))
         return;
 
     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
@@ -2717,14 +2725,7 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
     minRecoveryPoint = ControlFile->minRecoveryPoint;
     minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
-    /*
-     * An invalid minRecoveryPoint means that we need to recover all the WAL,
-     * i.e., we're doing crash recovery.  We never modify the control file's
-     * value in that case, so we can short-circuit future checks here too.
-     */
-    if (minRecoveryPoint == 0)
-        updateMinRecoveryPoint = false;
-    else if (force || minRecoveryPoint < lsn)
+    if (force || minRecoveryPoint < lsn)
     {
         XLogRecPtr    newMinRecoveryPoint;
         TimeLineID    newMinRecoveryPointTLI;
@@ -3110,8 +3111,8 @@ XLogNeedsFlush(XLogRecPtr record)
      */
     if (RecoveryInProgress())
     {
-        /* Quick exit if already known updated */
-        if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
+        /* Quick exit if already known to be updated or cannot be updated */
+        if (record <= minRecoveryPoint || XLogRecPtrIsInvalid(minRecoveryPoint))
             return false;
 
         /*
@@ -3124,20 +3125,8 @@ XLogNeedsFlush(XLogRecPtr record)
         minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
         LWLockRelease(ControlFileLock);
 
-        /*
-         * An invalid minRecoveryPoint means that we need to recover all the
-         * WAL, i.e., we're doing crash recovery.  We never modify the control
-         * file's value in that case, so we can short-circuit future checks
-         * here too.
-         */
-        if (minRecoveryPoint == 0)
-            updateMinRecoveryPoint = false;
-
         /* check again */
-        if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
-            return false;
-        else
-            return true;
+        return record > minRecoveryPoint;
     }
 
     /* Quick exit if already known flushed */
@@ -6892,9 +6881,31 @@ StartupXLOG(void)
         /* No need to hold ControlFileLock yet, we aren't up far enough */
         UpdateControlFile();
 
-        /* initialize our local copy of minRecoveryPoint */
-        minRecoveryPoint = ControlFile->minRecoveryPoint;
-        minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+        /*
+         * Initialize our local copy of minRecoveryPoint.  When doing crash
+         * recovery we want to replay up to the end of WAL.  Particularly, in
+         * the case of a promoted standby minRecoveryPoint value in the
+         * control file is only updated after the first checkpoint.  However,
+         * if the instance crashes before the first post-recovery checkpoint
+         * is completed then recovery will use a stale location causing the
+         * startup process to think that there are still invalid page
+         * references when checking for data consistency.
+         */
+        if (InArchiveRecovery)
+        {
+            minRecoveryPoint = ControlFile->minRecoveryPoint;
+            minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+        }
+        else
+        {
+            /*
+             * We are to run crash recovery. Set invalid value to
+             * minRecoveryPoint ignoring the shared variable, which inhibis
+             * updating the variable during crash recovery.
+             */
+            minRecoveryPoint = InvalidXLogRecPtr;
+            minRecoveryPointTLI = 0;
+        }
 
         /*
          * Reset pgstat data, because it may be invalid after recovery.
@@ -7861,6 +7872,8 @@ CheckRecoveryConsistency(void)
     if (XLogRecPtrIsInvalid(minRecoveryPoint))
         return;
 
+    Assert(InArchiveRecovery);
+
     /*
      * assume that we are called in the startup process, and hence don't need
      * a lock to read lastReplayedEndRecPtr
@@ -9265,6 +9278,7 @@ CreateRestartPoint(int flags)
          * no value in having the minimum recovery point any earlier than this
          * anyway, because redo will begin just after the checkpoint record.
          */
+        Assert(ControlFile->minRecoveryPoint != InvalidXLogRecPtr);
         if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
         {
             ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
@@ -9949,14 +9963,21 @@ xlog_redo(XLogReaderState *record)
          * Update minRecoveryPoint to ensure that if recovery is aborted, we
          * recover back up to this point before allowing hot standby again.
          * This is important if the max_* settings are decreased, to ensure
-         * you don't run queries against the WAL preceding the change.
+         * you don't run queries against the WAL preceding the change. Both
+         * local and shared variables cannot be updated and we expect all the
+         * WAL to be replayed while crash recovery is happning.
          */
-        minRecoveryPoint = ControlFile->minRecoveryPoint;
-        minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
-        if (minRecoveryPoint != 0 && minRecoveryPoint < lsn)
+        if (InArchiveRecovery)
         {
-            ControlFile->minRecoveryPoint = lsn;
-            ControlFile->minRecoveryPointTLI = ThisTimeLineID;
+            Assert(minRecoveryPoint != InvalidXLogRecPtr);
+            minRecoveryPoint = ControlFile->minRecoveryPoint;
+            minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+
+            if (minRecoveryPoint < lsn)
+            {
+                ControlFile->minRecoveryPoint = lsn;
+                ControlFile->minRecoveryPointTLI = ThisTimeLineID;
+            }
         }
 
         CommitTsParameterChange(xlrec.track_commit_timestamp,

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [PATCH] Include application_name in "connection authorized" logmessage
Next
From: Masahiko Sawada
Date:
Subject: Re: Copy function for logical replication slots