Re: [PATCH] Fix pg_rewind false positives caused by shutdown-only WAL - Mailing list pgsql-hackers

From BharatDB
Subject Re: [PATCH] Fix pg_rewind false positives caused by shutdown-only WAL
Date
Msg-id CAAh00ERqqAhgA_BJJccwE0BXxUWMk+FHzMoLo1kWcsm+qdNVjw@mail.gmail.com
Whole thread Raw
In response to [PATCH] Fix pg_rewind false positives caused by shutdown-only WAL  (Srinath Reddy Sadipiralla <srinath2133@gmail.com>)
Responses Re: [PATCH] Fix pg_rewind false positives caused by shutdown-only WAL
List pgsql-hackers

Dear Srinath,

Subject: [PATCH] pg_rewind: Ignore shutdown checkpoints when determining rewind necessity.

While working with pg_rewind, I noticed that it can sometimes request a rewind even when no real changes exist after a failover. This happens because pg_rewind currently determines the end-of-WAL on the target using the last shutdown checkpoint (or minRecoveryPoint for a standby). In a clean failover scenario—where a standby is promoted and the old primary is later shut down—the only WAL record generated after divergence may be a shutdown checkpoint. Although the data on both nodes is identical, pg_rewind treats this shutdown record as meaningful and unnecessarily forces a rewind. The proposed patch fixes this by ignoring shutdown checkpoints (XLOG_CHECKPOINT_SHUTDOWN) when determining the end-of-WAL, scanning backward until a non-shutdown record is found. This ensures that rewinds are triggered only when actual modifications exist after divergence, avoiding unnecessary rewinds in clean failover situations.

Also, with the proposed fix implemented in my local script, it gives the following results:

  • Old primary shuts down cleanly.

  • Standby is promoted successfully.

  • pg_rewind correctly detects no rewind is needed.

  • Data on both clusters matches perfectly.

I believe this change will prevent unnecessary rewinds in production environments, improve reliability, and avoid potential confusion during failovers. 

Thank you for your consideration.

Best regards,
Soumya.



On Sat, Sep 6, 2025 at 10:04 PM Srinath Reddy Sadipiralla <srinath2133@gmail.com> wrote:
Hi all,

While working with pg_rewind, I noticed that it can sometimes request a rewind even when no actual changes exist after a failover.

Problem:
Currently, pg_rewind determines the end-of-WAL on the target by using the last shutdown checkpoint (or minRecoveryPoint for a standby). This creates a false positive scenario:

1)Suppose a standby is promoted to become the new primary.
2)Later, the old primary is cleanly shut down.
3)The only WAL record generated on the old primary after divergence is a shutdown checkpoint.

At this point, the old primary and new primary contain identical data. However, since the shutdown checkpoint extends the WAL past the divergence point, pg_rewind concludes:

if (target_wal_endrec > divergerec)
    rewind_needed = true;

That forces a rewind even though there are no meaningful changes.

To reproduce this scenario use the below attached script.

Fix:
The attached patch changes the logic so that pg_rewind no longer treats shutdown checkpoints as meaningful records when determining the end-of-WAL. Instead, we scan backward from the last checkpoint until we find the most recent valid WAL record that is not a shutdown-only related record.

This ensures rewind is only triggered when there are actual modifications after divergence, avoiding unnecessary rewinds in clean failover scenarios.


--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/
Attachment

pgsql-hackers by date:

Previous
From: Greg Burd
Date:
Subject: Re: [PATCH] Add tests for Bitmapset
Next
From: Chao Li
Date:
Subject: Re: GB18030-2022 Support in PostgreSQL