Re: Crash by targetted recovery - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Crash by targetted recovery
Date
Msg-id 20200306.102946.81443433484362211.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Crash by targetted recovery  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: Crash by targetted recovery  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
At Thu, 5 Mar 2020 19:51:11 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> 
> 
> On 2020/03/05 12:08, Kyotaro Horiguchi wrote:
> > I understand that the reconnection for REDO record is useless. Ok I
> > take the !StandbyMode way.
> > The attached is the test script that is changed to count the added
> > test, and the slight revised main patch.
> 
> Thanks for the patch!
> 
> + /* Wal receiver should not active when entring XLOG_FROM_ARCHIVE */
> +        Assert(!WalRcvStreaming());
> 
> +1 to add this assertion check.
> 
> Isn't it better to always check this while trying to read WAL from
> archive or pg_wal? So, what about the following change?
> 
>                 {
>                         case XLOG_FROM_ARCHIVE:
>                         case XLOG_FROM_PG_WAL:
> +                               /*
> + * WAL receiver should not be running while trying to
> +                                * read WAL from archive or pg_wal.
> +                                */
> +                               Assert(!WalRcvStreaming());
> +
>                                 /* Close any old file we might have open. */
>                                 if (readFile >= 0)

(It seems retroverting to the first patch when I started this...)
The second place covers wider cases so I reverted the first place.

> + lastSourceFailed = false; /* We haven't failed on the new source */
> 
> Is this really necessary? Since ReadRecord() always reset
> lastSourceFailed to false, it seems not necessary.

It's just to make sure.  Actually lastSourceFailed is always false
when we get there.  But when the source is switched, lastSourceFailed
should be changed to false as a matter of design.  I'd like to do that
unless that harms.

> -    else if (currentSource == 0)
> +    else if (currentSource == 0 ||
> 
> Though this is a *separate topic*, 0 should be XLOG_FROM_ANY?
> There are some places where 0 is used as the value of currentSource.
> IMO they should be updated so that XLOG_FROM_ANY is used instead of 0.

Yeah, I've thought that many times but have neglected since it is not
critical and trivial as a separate patch.  I'd take the chance to do
that now.  Another minor glitch is "int oldSource = currentSource;" it
is not debugger-friendly so I changed it to XLogSource.  It is added
as a new patch file before the main patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 839edcba7e47156e77d5dc9ec7dfb9babc9ba846 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Wed, 26 Feb 2020 20:41:11 +0900
Subject: [PATCH v4 1/3] TAP test for a crash bug

---
 src/test/recovery/t/003_recovery_targets.pl | 34 ++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index fd14bab208..5b23ceb3c4 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 10;
 use Time::HiRes qw(usleep);
 
 # Create and test a standby from given backup, with a certain recovery target.
@@ -167,3 +167,35 @@ foreach my $i (0..100)
 $logfile = slurp_file($node_standby->logfile());
 ok($logfile =~ qr/FATAL:  recovery ended before configured recovery target was reached/,
     'recovery end before target reached is a fatal error');
+
+# Corner case where targetted promotion happens on segment boundary
+$node_standby = get_new_node('standby_9');
+$node_standby->init_from_backup($node_master, 'my_backup',
+                                has_restoring => 1, has_streaming => 1);
+$node_standby->start;
+## read wal_segment_size
+my $result =
+  $node_standby->safe_psql('postgres', "SHOW wal_segment_size");
+die "unknown format of wal_segment_size: $result\n"
+  if ($result !~ /^([0-9]+)MB$/);
+my $segsize = $1 * 1024 * 1024;
+## stop just before the next segment boundary
+$result =
+  $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
+my ($seg, $off) = split('/', $result);
+my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) * $segsize);
+
+$node_standby->stop;
+$node_standby->append_conf('postgresql.conf', qq(
+recovery_target_inclusive=no
+recovery_target_lsn='$target'
+recovery_target_action='promote'
+));
+$node_standby->start;
+## do targetted promote
+$node_master->safe_psql('postgres', "CREATE TABLE t(); DROP TABLE t;");
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal(); CHECKPOINT;");
+my $caughtup_query = "SELECT NOT pg_is_in_recovery()";
+$node_standby->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for standby to promote";
+pass("targetted promotion on segment bounary");
-- 
2.18.2

From 82274c41cb529d87270d4e2a78dbf06cd139b8f6 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Fri, 6 Mar 2020 10:11:59 +0900
Subject: [PATCH v4 2/3] Tidy up XLogSource usage.

We used interger 0 instead of XLOG_FROM_ANY and defined a variable to
store the type with int. Tidy them up for readability and
debugger-friendliness.
---
 src/backend/access/transam/xlog.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4361568882..db054c8d32 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -804,7 +804,7 @@ static XLogSource readSource = 0;    /* XLOG_FROM_* code */
  * attempt to read from currentSource failed, and we should try another source
  * next.
  */
-static XLogSource currentSource = 0;    /* XLOG_FROM_* code */
+static XLogSource currentSource = XLOG_FROM_ANY;    /* XLOG_FROM_* code */
 static bool lastSourceFailed = false;
 
 typedef struct XLogPageReadPrivate
@@ -4387,7 +4387,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
                  * so that we will check the archive next.
                  */
                 lastSourceFailed = false;
-                currentSource = 0;
+                currentSource = XLOG_FROM_ANY;
 
                 continue;
             }
@@ -11864,7 +11864,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 
     for (;;)
     {
-        int            oldSource = currentSource;
+        XLogSource    oldSource = currentSource;
 
         /*
          * First check if we failed to read from the current source, and
-- 
2.18.2

From 3a0f31b7c5a9d7f53134a4c909e69814b24bf526 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 5 Mar 2020 11:54:07 +0900
Subject: [PATCH v4 3/3] Fix a crash bug of targetted promotion.

After recovery target is reached, StartupXLOG turns off standby mode
then refetches the last record. If the last record starts from the
previous segment at the time, WaitForWALToBecomeAvailable crashes with
assertion failure.  WaitForWALToBecomeAvailable should move back to
XLOG_FROM_ARCHIVE if standby mode is turned off while
XLOG_FROM_STREAM.
---
 src/backend/access/transam/xlog.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index db054c8d32..88abd53cfb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11844,7 +11844,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
      * 1. Read from either archive or pg_wal (XLOG_FROM_ARCHIVE), or just
      *      pg_wal (XLOG_FROM_PG_WAL)
      * 2. Check trigger file
-     * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM)
+     * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM).
+     *    If StandbyMode is turned off, the state machine goes back to 1.
      * 4. Rescan timelines
      * 5. Sleep wal_retrieve_retry_interval milliseconds, and loop back to 1.
      *
@@ -11859,8 +11860,12 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
      */
     if (!InArchiveRecovery)
         currentSource = XLOG_FROM_PG_WAL;
-    else if (currentSource == 0)
+    else if (currentSource == 0 ||
+             (!StandbyMode && currentSource == XLOG_FROM_STREAM))
+    {
+        lastSourceFailed = false; /* We haven't failed on the new source */
         currentSource = XLOG_FROM_ARCHIVE;
+    }
 
     for (;;)
     {
@@ -12054,6 +12059,9 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
         {
             case XLOG_FROM_ARCHIVE:
             case XLOG_FROM_PG_WAL:
+                /* Wal receiver shouln't be active here */
+                Assert(!WalRcvStreaming());
+
                 /* Close any old file we might have open. */
                 if (readFile >= 0)
                 {
-- 
2.18.2


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Do we need to handle orphaned prepared transactions in theserver?
Next
From: Michael Paquier
Date:
Subject: Re: reindex concurrently and two toast indexes