Re: Startup PANIC on standby promotion due to zero-filled WAL segment - Mailing list pgsql-hackers

From Alena Vinter
Subject Re: Startup PANIC on standby promotion due to zero-filled WAL segment
Date
Msg-id CAGWv16JyznsODC8e7T-UuGSOE+6ZM1MjdCCgP1ZVg5iCK7Yh-g@mail.gmail.com
Whole thread Raw
In response to Re: Startup PANIC on standby promotion due to zero-filled WAL segment  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Startup PANIC on standby promotion due to zero-filled WAL segment
List pgsql-hackers
Hi Michael,

Thanks for the review. To clarify: TLI 1 does not diverge — it is fully replicated to the standby before the timeline switch. The test then intentionally slows down replication on TLI 2 (e.g., by delaying WAL shipping), reproducing the scenario I illustrated. As far as I’m aware, `fsync` is `on` by default, and the test does not modify it — so no WAL records are lost due to unsafe flushing.

The core issue is that the new timeline’s segment is zero-initialized instead of copying the same segment from the previous timeline (as done in crash-recovery startup).  As a result, startup cannot finish recovery due to non-replicated end of WAL causing failures like “invalid magic number”. 

---
Alena Vinter 

pgsql-hackers by date:

Previous
From: Chao Li
Date:
Subject: Re: Improve documentation of publication privilege checks
Next
From: Michael Paquier
Date:
Subject: Re: Startup PANIC on standby promotion due to zero-filled WAL segment