Re: A failure of standby to follow timeline switch - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: A failure of standby to follow timeline switch |
Date | |
Msg-id | 20210107.163236.776031941315368203.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: A failure of standby to follow timeline switch (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: A failure of standby to follow timeline switch
|
List | pgsql-hackers |
At Thu, 7 Jan 2021 11:55:33 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > On 2021/01/05 17:26, Kyotaro Horiguchi wrote: > > At Mon, 4 Jan 2021 19:00:21 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> > > wrote in > >> > >> > >> On 2021/01/04 12:06, Kyotaro Horiguchi wrote: > >>> At Sat, 26 Dec 2020 02:15:06 +0900, Fujii Masao > >>> <masao.fujii@oss.nttdata.com> wrote in > >>>> > >>>> On 2020/12/25 12:03, Kyotaro Horiguchi wrote: > >>> The attached is the fixed version. > >> > >> Thanks for updating the patches! > >> > >>> In the first patch, the test added to 001_stream_rep.pl involves two > >>> copied functions related to server-log investigation from > >>> 019_repslot_limit.pl. > >> > >> So you're planning to define them commonly in TestLib.pm or elsewhere? > > Yeah.. That's correct. Newly added as the first patch. > > While making that change, I extended the interface of slurp_file to > > allow reading from arbitrary position. > > Is this extension really helpful for current use case? > At least I'd like to avoid back-patching this since it's an exntesion... Yeah, I felt a hesitattion about it a bit. It's less useful assuming that log files won't get so large. Removed in this version. > OsFHandleOpen(my $fh = IO::Handle->new(), $fHandle, 'r') > or croak "could not read \"$filename\": $^E\n"; > + seek($fh, $from, 0) > + or croak "could not seek \"$filename\" to $from: $^E\n"; > > I'm not familiar with this area, but SetFilePointer() is more suitable > rather than seek()? SetFilePointer() works for a native handle, IO::Handle->new() here. seek() works on $fh, a perl handle. If ReadFile is used later SetFilePointer() might be needed separately. Anyway, it is removed. -- Kyotaro Horiguchi NTT Open Source Software Center From 39f284c2109b63e444eb8d437fe51ae3268d305b Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com> Date: Tue, 5 Jan 2021 13:34:36 +0900 Subject: [PATCH v4 1/3] Move TAP log-searching feature to common modules Some private functions in 019_repslot_limit.pl will be used in other tests. Move them to common modules so that they are available to the new tests. --- src/test/perl/PostgresNode.pm | 36 ++++++++++++++++++++++ src/test/recovery/t/019_replslot_limit.pl | 37 ++++------------------- 2 files changed, 42 insertions(+), 31 deletions(-) diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index 9667f7667e..d78e9f93fb 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -2223,6 +2223,42 @@ sub pg_recvlogical_upto =pod +=item $node->current_log_position() + +Return the current position of server log. + +=cut + +sub current_log_position +{ + my $self = shift; + + return (stat $self->logfile)[7]; +} + +=pod + +=item $node->find_in_log($pattern, $startpos) + +Returns whether the $pattern occurs after $startpos in the server log. + +=cut + +sub find_in_log +{ + my ($self, $pattern, $startpos) = @_; + + $startpos = 0 unless defined $startpos; + my $log = TestLib::slurp_file($self->logfile); + return 0 if (length($log) <= $startpos); + + $log = substr($log, $startpos); + + return $log =~ m/$pattern/; +} + +=pod + =back =cut diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl index 20f4a1bbc3..b298d9992f 100644 --- a/src/test/recovery/t/019_replslot_limit.pl +++ b/src/test/recovery/t/019_replslot_limit.pl @@ -165,19 +165,17 @@ $node_primary->wait_for_catchup($node_standby, 'replay', $start_lsn); $node_standby->stop; -ok( !find_in_log( - $node_standby, - "requested WAL segment [0-9A-F]+ has already been removed"), +ok( !$node_standby->find_in_log( + "requested WAL segment [0-9A-F]+ has already been removed"), 'check that required WAL segments are still available'); # Advance WAL again, the slot loses the oldest segment. -my $logstart = get_log_size($node_primary); +my $logstart = $node_primary->current_log_position(); advance_wal($node_primary, 7); $node_primary->safe_psql('postgres', "CHECKPOINT;"); # WARNING should be issued -ok( find_in_log( - $node_primary, +ok( $node_primary->find_in_log( "invalidating slot \"rep1\" because its restart_lsn [0-9A-F/]+ exceeds max_slot_wal_keep_size", $logstart), 'check that the warning is logged'); @@ -190,14 +188,13 @@ is($result, "rep1|f|t|lost|", 'check that the slot became inactive and the state "lost" persists'); # The standby no longer can connect to the primary -$logstart = get_log_size($node_standby); +$logstart = $node_standby->current_log_position(); $node_standby->start; my $failed = 0; for (my $i = 0; $i < 10000; $i++) { - if (find_in_log( - $node_standby, + if ($node_standby->find_in_log( "requested WAL segment [0-9A-F]+ has already been removed", $logstart)) { @@ -264,25 +261,3 @@ sub advance_wal } return; } - -# return the size of logfile of $node in bytes -sub get_log_size -{ - my ($node) = @_; - - return (stat $node->logfile)[7]; -} - -# find $pat in logfile of $node after $off-th byte -sub find_in_log -{ - my ($node, $pat, $off) = @_; - - $off = 0 unless defined $off; - my $log = TestLib::slurp_file($node->logfile); - return 0 if (length($log) <= $off); - - $log = substr($log, $off); - - return $log =~ m/$pat/; -} -- 2.27.0 From 97d3cfbad959ca889f26fe0d666566340161bd39 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com> Date: Wed, 9 Dec 2020 17:21:49 +0900 Subject: [PATCH v4 2/3] New test for timeline-tracking of walsender Walsender should track timeline changes while sending a historic timeline. Add a test for it. --- src/test/recovery/t/001_stream_rep.pl | 41 ++++++++++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl index 9e31a53de7..e78e98fb43 100644 --- a/src/test/recovery/t/001_stream_rep.pl +++ b/src/test/recovery/t/001_stream_rep.pl @@ -2,8 +2,9 @@ use strict; use warnings; use PostgresNode; +use Time::HiRes qw(usleep); use TestLib; -use Test::More tests => 36; +use Test::More tests => 37; # Initialize primary node my $node_primary = get_new_node('primary'); @@ -409,3 +410,41 @@ ok( ($phys_restart_lsn_pre cmp $phys_restart_lsn_post) == 0, my $primary_data = $node_primary->data_dir; ok(!-f "$primary_data/pg_wal/$segment_removed", "WAL segment $segment_removed recycled after physical slot advancing"); + +# +# Check if timeline-increment works while reading a historic timeline. +my $node_primary_2 = get_new_node('primary_2'); +# archiving is needed to create .paritial segment +$node_primary_2->init(allows_streaming => 1, has_archiving => 1); +$node_primary_2->start; +$node_primary_2->backup($backup_name); +my $node_standby_3 = get_new_node('standby_3'); +$node_standby_3->init_from_backup($node_primary_2, $backup_name, + has_streaming => 1); +$node_primary_2->stop; +$node_primary_2->set_standby_mode; # increment primary timeline +$node_primary_2->start; +$node_primary_2->promote; +my $logstart = $node_standby_3->current_log_position(); +$node_standby_3->start; + +my $success = 0; +for (my $i = 0 ; $i < 1000; $i++) +{ + if ($node_standby_3->find_in_log( + "requested WAL segment [0-9A-F]+ has already been removed", + $logstart)) + { + last; + } + elsif ($node_standby_3->find_in_log( + "End of WAL reached on timeline", + $logstart)) + { + $success = 1; + last; + } + usleep(100_000); +} + +ok($success, 'Timeline increment while reading a historic timeline'); -- 2.27.0 From 29decc6dc2ccea5ef4e2c8b92f9c2b65a3898fcb Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com> Date: Wed, 9 Dec 2020 17:22:41 +0900 Subject: [PATCH v4 3/3] Fix timeline-tracking failure while sending a historic timeline Walsender should track timeline switches while sending a historic timeline. Regain that behavior, which was broken in PG13, by a thinko of 709d003fbd. --- src/backend/replication/walsender.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index fe0d368a35..8545c6c423 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -2491,7 +2491,7 @@ WalSndSegmentOpen(XLogReaderState *state, XLogSegNo nextSegNo, XLogSegNo endSegNo; XLByteToSeg(sendTimeLineValidUpto, endSegNo, state->segcxt.ws_segsize); - if (state->seg.ws_segno == endSegNo) + if (nextSegNo == endSegNo) *tli_p = sendTimeLineNextTLI; } -- 2.27.0 From 933dd946ca547b7de2dfed84807cd9d871c12f6f Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com> Date: Tue, 5 Jan 2021 13:34:36 +0900 Subject: [PATCH] Fix timeline-tracking failure while sending a historic timeline Walsender should track timeline switches while sending a historic timeline. Regain that behavior, which was broken in PG13, by a thinko of 709d003fbd. Backpatch to PG13. --- src/backend/replication/walsender.c | 2 +- src/test/perl/PostgresNode.pm | 36 ++++++++++++++++++++ src/test/recovery/t/001_stream_rep.pl | 41 ++++++++++++++++++++++- src/test/recovery/t/019_replslot_limit.pl | 37 ++++---------------- 4 files changed, 83 insertions(+), 33 deletions(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 7f87eb7f19..04f6c3ebb4 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -2478,7 +2478,7 @@ WalSndSegmentOpen(XLogReaderState *state, XLogSegNo nextSegNo, XLogSegNo endSegNo; XLByteToSeg(sendTimeLineValidUpto, endSegNo, state->segcxt.ws_segsize); - if (state->seg.ws_segno == endSegNo) + if (nextSegNo == endSegNo) *tli_p = sendTimeLineNextTLI; } diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index 980f1f1533..a08c71b549 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -2138,6 +2138,42 @@ sub pg_recvlogical_upto =pod +=item $node->current_log_position() + +Return the current position of server log. + +=cut + +sub current_log_position +{ + my $self = shift; + + return (stat $self->logfile)[7]; +} + +=pod + +=item $node->find_in_log($pattern, $startpos) + +Returns whether the $pattern occurs after $startpos in the server log. + +=cut + +sub find_in_log +{ + my ($self, $pattern, $startpos) = @_; + + $startpos = 0 unless defined $startpos; + my $log = TestLib::slurp_file($self->logfile); + return 0 if (length($log) <= $startpos); + + $log = substr($log, $startpos); + + return $log =~ m/$pattern/; +} + +=pod + =back =cut diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl index 778f11b28b..8d2b24fe55 100644 --- a/src/test/recovery/t/001_stream_rep.pl +++ b/src/test/recovery/t/001_stream_rep.pl @@ -2,8 +2,9 @@ use strict; use warnings; use PostgresNode; +use Time::HiRes qw(usleep); use TestLib; -use Test::More tests => 36; +use Test::More tests => 37; # Initialize master node my $node_master = get_new_node('master'); @@ -409,3 +410,41 @@ ok( ($phys_restart_lsn_pre cmp $phys_restart_lsn_post) == 0, my $master_data = $node_master->data_dir; ok(!-f "$master_data/pg_wal/$segment_removed", "WAL segment $segment_removed recycled after physical slot advancing"); + +# +# Check if timeline-increment works while reading a historic timeline. +my $node_primary_2 = get_new_node('primary_2'); +# archiving is needed to create .paritial segment +$node_primary_2->init(allows_streaming => 1, has_archiving => 1); +$node_primary_2->start; +$node_primary_2->backup($backup_name); +my $node_standby_3 = get_new_node('standby_3'); +$node_standby_3->init_from_backup($node_primary_2, $backup_name, + has_streaming => 1); +$node_primary_2->stop; +$node_primary_2->set_standby_mode; # increment primary timeline +$node_primary_2->start; +$node_primary_2->promote; +my $logstart = $node_standby_3->current_log_position(); +$node_standby_3->start; + +my $success = 0; +for (my $i = 0 ; $i < 1000; $i++) +{ + if ($node_standby_3->find_in_log( + "requested WAL segment [0-9A-F]+ has already been removed", + $logstart)) + { + last; + } + elsif ($node_standby_3->find_in_log( + "End of WAL reached on timeline", + $logstart)) + { + $success = 1; + last; + } + usleep(100_000); +} + +ok($success, 'Timeline increment while reading a historic timeline'); diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl index a7231dcd47..8b3c5de057 100644 --- a/src/test/recovery/t/019_replslot_limit.pl +++ b/src/test/recovery/t/019_replslot_limit.pl @@ -165,19 +165,17 @@ $node_master->wait_for_catchup($node_standby, 'replay', $start_lsn); $node_standby->stop; -ok( !find_in_log( - $node_standby, - "requested WAL segment [0-9A-F]+ has already been removed"), +ok( !$node_standby->find_in_log( + "requested WAL segment [0-9A-F]+ has already been removed"), 'check that required WAL segments are still available'); # Advance WAL again, the slot loses the oldest segment. -my $logstart = get_log_size($node_master); +my $logstart = $node_master->current_log_position(); advance_wal($node_master, 7); $node_master->safe_psql('postgres', "CHECKPOINT;"); # WARNING should be issued -ok( find_in_log( - $node_master, +ok( $node_master->find_in_log( "invalidating slot \"rep1\" because its restart_lsn [0-9A-F/]+ exceeds max_slot_wal_keep_size", $logstart), 'check that the warning is logged'); @@ -190,14 +188,13 @@ is($result, "rep1|f|t|lost|", 'check that the slot became inactive and the state "lost" persists'); # The standby no longer can connect to the master -$logstart = get_log_size($node_standby); +$logstart = $node_standby->current_log_position(); $node_standby->start; my $failed = 0; for (my $i = 0; $i < 10000; $i++) { - if (find_in_log( - $node_standby, + if ($node_standby->find_in_log( "requested WAL segment [0-9A-F]+ has already been removed", $logstart)) { @@ -264,25 +261,3 @@ sub advance_wal } return; } - -# return the size of logfile of $node in bytes -sub get_log_size -{ - my ($node) = @_; - - return (stat $node->logfile)[7]; -} - -# find $pat in logfile of $node after $off-th byte -sub find_in_log -{ - my ($node, $pat, $off) = @_; - - $off = 0 unless defined $off; - my $log = TestLib::slurp_file($node->logfile); - return 0 if (length($log) <= $off); - - $log = substr($log, $off); - - return $log =~ m/$pat/; -} -- 2.27.0
pgsql-hackers by date: