On Tue, Dec 23, 2025 at 04:33:30PM +0700, Alena Vinter wrote:
> Thanks for the review. To clarify: TLI 1 does not diverge — it is fully
> replicated to the standby before the timeline switch. The test then
> intentionally slows down replication on TLI 2 (e.g., by delaying WAL
> shipping), reproducing the scenario I illustrated. As far as I’m aware,
> `fsync` is `on` by default, and the test does not modify it — so no WAL
> records are lost due to unsafe flushing.
Don't think so, based on what is in the tree:
$ git grep "fsync = " -- *.pm
src/test/perl/PostgreSQL/Test/Cluster.pm: print $conf "fsync = off\n";
> The core issue is that the new timeline’s segment is zero-initialized
> instead of copying the same segment from the previous timeline (as done in
> crash-recovery startup). As a result, startup cannot finish recovery due
> to non-replicated end of WAL causing failures like “invalid magic number”.
The following addition to your proposed test is telling me an entirely
different story, making the test pass as the records of TLI 1 are
around:
my $node_primary = PostgreSQL::Test::Cluster->new('primary');
$node_primary->init(allows_streaming => 1);
+#$node_primary->append_conf('postgresql.conf', 'fsync=on');
$node_primary->start;
--
Michael