A failure of standby to follow timeline switch - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject A failure of standby to follow timeline switch
Date
Msg-id 20201209.174314.282492377848029776.horikyota.ntt@gmail.com
Whole thread Raw
Responses Re: A failure of standby to follow timeline switch
List pgsql-hackers
Hello.

We found a behavioral change (which seems to be a bug) in recovery at
PG13.

The following steps might seem somewhat strange but the replication
code deliberately cope with the case.  This is a sequense seen while
operating a HA cluseter using Pacemaker.

- Run initdb to create a primary.
- Set archive_mode=on on the primary.
- Start the primary.

- Create a standby using pg_basebackup from the primary.
- Stop the standby.
- Stop the primary.

- Put stnadby.signal to the primary then start it.
- Promote the primary.

- Start the standby.


Until PG12, the parimary signals end-of-timeline to the standby and
switches to the next timeline.  Since PG13, that doesn't happen and
the standby continues to request for the segment of the older
timeline, which no longer exists.

FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000000000000003 has already been
removed

It is because WalSndSegmentOpen() can fail to detect a timeline switch
on a historic timeline, due to use of a wrong variable to check
that. It is using state->seg.ws_segno but it seems to be a thinko when
the code around was refactored in 709d003fbd.

The first patch detects the wrong behavior.  The second small patch
fixes it.

In the first patch, the test added to 001_stream_rep.pl involves two
copied functions related to server-log investigation from
019_repslot_limit.pl.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From f7fbb1574a412100a6732dbe17b0d7f66a497298 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Wed, 9 Dec 2020 17:21:49 +0900
Subject: [PATCH 1/2] Add a new test to detect a replication bug

---
 src/test/recovery/t/001_stream_rep.pl | 64 ++++++++++++++++++++++++++-
 1 file changed, 63 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 9e31a53de7..9dd1a70c90 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 36;
+use Test::More tests => 37;
 
 # Initialize primary node
 my $node_primary = get_new_node('primary');
@@ -409,3 +409,65 @@ ok( ($phys_restart_lsn_pre cmp $phys_restart_lsn_post) == 0,
 my $primary_data = $node_primary->data_dir;
 ok(!-f "$primary_data/pg_wal/$segment_removed",
     "WAL segment $segment_removed recycled after physical slot advancing");
+
+#
+# Check if timeline-increment works while reading a historic timeline.
+my $node_primary_2 = get_new_node('primary_2');
+$node_primary_2->init(allows_streaming => 1);
+$node_primary_2->enable_archiving; # needed to make .paritial segment
+$node_primary_2->start;
+$node_primary_2->backup($backup_name);
+my $node_standby_3 = get_new_node('standby_3');
+$node_standby_3->init_from_backup($node_primary_2, $backup_name,
+                                  has_streaming => 1);
+$node_primary_2->stop;
+$node_primary_2->set_standby_mode; # increment primary timeline
+$node_primary_2->start;
+$node_primary_2->promote;
+my $logstart = get_log_size($node_standby_3);
+$node_standby_3->start;
+
+my $success = 0;
+for (my $i = 0 ; $i < 10000; $i++)
+{
+    if (find_in_log(
+            $node_standby_3,
+            "requested WAL segment [0-9A-F]+ has already been removed",
+            $logstart))
+    {
+        last;
+    }
+    elsif (find_in_log(
+            $node_standby_3,
+            "End of WAL reached on timeline",
+               $logstart))
+    {
+        $success = 1;
+        last;
+    }
+    usleep(100_000);
+}
+
+ok($success, 'Timeline increment while reading a historic timeline');
+
+# return the size of logfile of $node in bytes
+sub get_log_size
+{
+    my ($node) = @_;
+
+    return (stat $node->logfile)[7];
+}
+
+# find $pat in logfile of $node after $off-th byte
+sub find_in_log
+{
+    my ($node, $pat, $off) = @_;
+
+    $off = 0 unless defined $off;
+    my $log = TestLib::slurp_file($node->logfile);
+    return 0 if (length($log) <= $off);
+
+    $log = substr($log, $off);
+
+    return $log =~ m/$pat/;
+}
-- 
2.27.0

From f3862b02b928c4f8aa6380bbccbac8e3c4bf800e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Wed, 9 Dec 2020 17:22:41 +0900
Subject: [PATCH 2/2] Fix a bug of timeline-tracking while sending a historic
 timeline

Since PG13, we should track a timeline switch while sending a historic
timeline running physical replication, but WalSndSegmentOpen fails to
do that. It is a thinko of 709d003fbd.
---
 src/backend/replication/walsender.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 2eb19ad293..0a3f2f1359 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2497,7 +2497,7 @@ WalSndSegmentOpen(XLogReaderState *state, XLogSegNo nextSegNo,
         XLogSegNo    endSegNo;
 
         XLByteToSeg(sendTimeLineValidUpto, endSegNo, state->segcxt.ws_segsize);
-        if (state->seg.ws_segno == endSegNo)
+        if (nextSegNo == endSegNo)
             *tli_p = sendTimeLineNextTLI;
     }
 
-- 
2.27.0


pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Occasional tablespace.sql failures in check-world -jnn
Next
From: Pavel Stehule
Date:
Subject: Re: [Proposal] Global temporary tables