Make mesage at end-of-recovery less scary. - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Make mesage at end-of-recovery less scary.
Date
Msg-id 20200228.160100.2210969269596489579.horikyota.ntt@gmail.com
Whole thread Raw
Responses Re: Make mesage at end-of-recovery less scary.
List pgsql-hackers
Hello, this is a followup thread of [1].

# I didn't noticed that the thread didn't cover -hackers..

When recovery of any type ends, we see several kinds of error messages
that says "WAL is broken".

> LOG:  invalid record length at 0/7CB6BC8: wanted 24, got 0
> LOG:  redo is not required
> LOG:  database system is ready to accept connections

This patch reduces the scariness of such messages as the follows.

> LOG:  rached end of WAL at 0/1551048 on timeline 1 in pg_wal during crash recovery
> DETAIL:  invalid record length at 0/1551048: wanted 24, got 0
> LOG:  redo is not required
> LOG:  database system is ready to accept connections

[1] https://www.postgresql.org/message-id/20200117.172655.1384889922565817808.horikyota.ntt%40gmail.com

I'll register this to the coming CF.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From f3692cb484b7f1ebc351ba8a522039c0b91bcfdb Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Fri, 28 Feb 2020 15:52:58 +0900
Subject: [PATCH] Make End-Of-Recovery error less scary

When recovery in any type ends, we see a bit scary error message like
"invalid record length" that suggests something serious is happening.
Make this message less scary as "reached end of WAL".
---
 src/backend/access/transam/xlog.c | 45 ++++++++++++++++++++++++++-----
 1 file changed, 38 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d19408b3be..452c376f62 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4288,6 +4288,10 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
         EndRecPtr = xlogreader->EndRecPtr;
         if (record == NULL)
         {
+            int actual_emode =
+                emode_for_corrupt_record(emode,
+                                         ReadRecPtr ? ReadRecPtr : EndRecPtr);
+
             if (readFile >= 0)
             {
                 close(readFile);
@@ -4295,14 +4299,41 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
             }
 
             /*
-             * We only end up here without a message when XLogPageRead()
-             * failed - in that case we already logged something. In
-             * StandbyMode that only happens if we have been triggered, so we
-             * shouldn't loop anymore in that case.
+             * randAccess here means we are reading successive records during
+             * recovery. If we get here during recovery, we can assume that we
+             * reached the end of WAL.  Otherwise something's really wrong and
+             * we report just only the errormsg if any. If we don't receive
+             * errormsg here, we already logged something.  We don't emit
+             * "reached end of WAL" in muted messages.
+             *
+             * Note: errormsg is alreay translated.
              */
-            if (errormsg)
-                ereport(emode_for_corrupt_record(emode, EndRecPtr),
-                        (errmsg_internal("%s", errormsg) /* already translated */ ));
+            if (!private->randAccess && actual_emode == emode)
+            {
+                if (StandbyMode)
+                    ereport(actual_emode,
+                            (errmsg ("rached end of WAL at %X/%X on timeline %u in %s during streaming replication",
+                                     (uint32) (EndRecPtr >> 32), (uint32) EndRecPtr,
+                                     ThisTimeLineID,
+                                     xlogSourceNames[currentSource]),
+                             (errormsg ? errdetail_internal("%s", errormsg) : 0)));
+                else if (InArchiveRecovery)
+                    ereport(actual_emode,
+                            (errmsg ("rached end of WAL at %X/%X on timeline %u in %s during archive recovery",
+                                     (uint32) (EndRecPtr >> 32), (uint32) EndRecPtr,
+                                     ThisTimeLineID,
+                                     xlogSourceNames[currentSource]),
+                             (errormsg ? errdetail_internal("%s", errormsg) : 0)));
+                else
+                    ereport(actual_emode,
+                            (errmsg ("rached end of WAL at %X/%X on timeline %u in %s during crash recovery",
+                                     (uint32) (EndRecPtr >> 32), (uint32) EndRecPtr,
+                                     ThisTimeLineID,
+                                     xlogSourceNames[currentSource]),
+                             (errormsg ? errdetail_internal("%s", errormsg) : 0)));
+            }
+            else if (errormsg)
+                ereport(actual_emode, (errmsg_internal("%s", errormsg)));
         }
 
         /*
-- 
2.18.2


pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [Patch] pg_rewind: options to use restore_command fromrecovery.conf or command line
Next
From: Andy Fan
Date:
Subject: Re: Trying to pull up EXPR SubLinks