Re: Crash by targetted recovery - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Crash by targetted recovery |
Date | |
Msg-id | 20200310.145900.21675815166709370.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: Crash by targetted recovery (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Crash by targetted recovery
|
List | pgsql-hackers |
At Tue, 10 Mar 2020 10:50:52 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > Pushed the v5-0001-Tidy-up-XLogSource-usage.patch! Thanks! > Regarding the remaining patch adding the regression test, I didn't seriously inteneded it to be in the tree. > +$result = > + $node_standby->safe_psql('postgres', "SELECT > pg_last_wal_replay_lsn()"); > +my ($seg, $off) = split('/', $result); > +my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) * > $segsize); > > What happens if "off" part gets the upper limit and "seg" part needs > to be incremented? What happens if pg_last_wal_replay_lsn() advances > very much (e.g., because of autovacuum) beyond the segment boundary > until the standby restarts? Of course, these situations very rarely > happen, > but I'd like to avoid adding such not stable test if possible. In the first place the "seg" is "fileno". Honestly I don't think the test doesn't reach to fileno boundary but I did in the attached. Since perl complains over-32bit integer arithmetic as incomptible, the calculation gets a bit odd shape to avoid over-32bit arithmetic. For the second point, which seems more likely to happen, I added the VACUUM/pg_switch_wal() sequence then wait standby for catch up, before doing the test. Does it make sense? regards. -- Kyotaro Horiguchi NTT Open Source Software Center From 89621911e09df73f0f3a63501bb41ffb37edef05 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Tue, 10 Mar 2020 14:43:29 +0900 Subject: [PATCH v5] TAP test for a crash bug Add a test for targetted promotion happend at segment boundary. --- src/test/recovery/t/003_recovery_targets.pl | 50 ++++++++++++++++++++- 1 file changed, 49 insertions(+), 1 deletion(-) diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl index fd14bab208..6b319bc891 100644 --- a/src/test/recovery/t/003_recovery_targets.pl +++ b/src/test/recovery/t/003_recovery_targets.pl @@ -3,7 +3,7 @@ use strict; use warnings; use PostgresNode; use TestLib; -use Test::More tests => 9; +use Test::More tests => 10; use Time::HiRes qw(usleep); # Create and test a standby from given backup, with a certain recovery target. @@ -167,3 +167,51 @@ foreach my $i (0..100) $logfile = slurp_file($node_standby->logfile()); ok($logfile =~ qr/FATAL: recovery ended before configured recovery target was reached/, 'recovery end before target reached is a fatal error'); + +# Corner case where targetted promotion happens on segment boundary +$node_standby = get_new_node('standby_9'); +$node_standby->init_from_backup($node_master, 'my_backup', + has_restoring => 1, has_streaming => 1); +$node_standby->start; +## make sure replication stays at the beginning of a segment +$node_master->safe_psql('postgres', "VACUUM; SELECT pg_switch_wal();"); +my $catch_up_lsn = + $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn();"); +my $caughtup_query = + "SELECT '$catch_up_lsn'::pg_lsn <= pg_last_wal_replay_lsn()"; +$node_standby->poll_query_until('postgres', $caughtup_query) + or die "Timed out while waiting for standby to catch up"; +## calculate the next segment boundary +my $result = + $node_standby->safe_psql('postgres', "SHOW wal_segment_size"); +die "unknown format of wal_segment_size: $result\n" + if ($result !~ /^([0-9]+)MB$/); +my $segsize = $1 * 1024 * 1024; +$result = + $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()"); +my ($fileno, $off) = split('/', $result); +my ($newoff, $newfileno) = (hex($off), hex($fileno)); +my $newsegnum = int($newoff / $segsize) + 1; +## treat segment-overflow, avoiding over-32bit arithmetic, segsize is +## a power of 2 larger than 1MB. +if ($newsegnum * ($segsize >> 1) >= 0x80000000) +{ + $newfileno += ($newsegnum * ($segsize >> 1)) >> 31; + $newsegnum %= (0x80000000 / ($segsize >> 1)); +} +my $target = sprintf("%X/%08X", $newfileno, $newsegnum * $segsize); +## the standby stops just after the next segment boundary +$node_standby->stop; +$node_standby->append_conf('postgresql.conf', qq( +recovery_target_inclusive=no +recovery_target_lsn='$target' +recovery_target_action='promote' +)); +$node_standby->start; +## do targetted promote +$node_master->safe_psql('postgres', "CREATE TABLE t(); DROP TABLE t;"); +$node_master->safe_psql('postgres', "SELECT pg_switch_wal(); CHECKPOINT;"); +$caughtup_query = "SELECT NOT pg_is_in_recovery()"; +$node_standby->poll_query_until('postgres', $caughtup_query) + or die "Timed out while waiting for standby to promote"; +pass("targetted promotion on segment bounary"); -- 2.18.2
pgsql-hackers by date: