Re: Race condition in recovery? - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Race condition in recovery?
Date
Msg-id 20210610.101240.1270925505780628275.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Race condition in recovery?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Race condition in recovery?
List pgsql-hackers
At Wed, 09 Jun 2021 19:09:54 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> Robert Haas <robertmhaas@gmail.com> writes:
> > Got it. I have now committed the patch to all branches, after adapting
> > your changes just a little bit.
> > Thanks to you and Kyotaro-san for all the time spent on this. What a slog!
> 
> conchuela failed its first encounter with this test case:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-06-09%2021%3A12%3A25
> 
> That machine has a certain, er, history of flakiness; so this may
> not mean anything.  Still, we'd better keep an eye out to see if
> the test needs more stabilization.


https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=conchuela&dt=2021-06-09%2021%3A12%3A25&stg=recovery-check

> ==~_~===-=-===~_~== pgsql.build/src/test/recovery/tmp_check/log/025_stuck_on_old_timeline_cascade.log
==~_~===-=-===~_~==
....
> 2021-06-09 23:31:10.439 CEST [893820:1] LOG:  started streaming WAL from primary at 0/2000000 on timeline 1
> 2021-06-09 23:31:10.439 CEST [893820:2] FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment
000000010000000000000002has already been removed
 

The script 025_stuck_on_olde_timeline.pl (and I) forgets to set
wal_keep_size(segments).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/test/recovery/t/025_stuck_on_old_timeline.pl b/src/test/recovery/t/025_stuck_on_old_timeline.pl
index 0d96bb3c15..25c2dff437 100644
--- a/src/test/recovery/t/025_stuck_on_old_timeline.pl
+++ b/src/test/recovery/t/025_stuck_on_old_timeline.pl
@@ -27,6 +27,7 @@ $perlbin =~ s{\\}{\\\\}g if ($TestLib::windows_os);
 my $archivedir_primary = $node_primary->archive_dir;
 $node_primary->append_conf('postgresql.conf', qq(
 archive_command = '$perlbin "$FindBin::RealBin/cp_history_files" "%p" "$archivedir_primary/%f"'
+wal_keep_size=128MB
 ));
 $node_primary->start;

diff --git a/src/test/recovery/t/025_stuck_on_old_timeline.pl b/src/test/recovery/t/025_stuck_on_old_timeline.pl
index 0d96bb3c15..8099571299 100644
--- a/src/test/recovery/t/025_stuck_on_old_timeline.pl
+++ b/src/test/recovery/t/025_stuck_on_old_timeline.pl
@@ -27,6 +27,7 @@ $perlbin =~ s{\\}{\\\\}g if ($TestLib::windows_os);
 my $archivedir_primary = $node_primary->archive_dir;
 $node_primary->append_conf('postgresql.conf', qq(
 archive_command = '$perlbin "$FindBin::RealBin/cp_history_files" "%p" "$archivedir_primary/%f"'
+wal_keep_segments=8
 ));
 $node_primary->start;


pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Next
From: John Naylor
Date:
Subject: a path towards replacing GEQO with something better