[PATCH] Fix pg_rewind false positives caused by shutdown-only WAL - Mailing list pgsql-hackers

From Srinath Reddy Sadipiralla
Subject [PATCH] Fix pg_rewind false positives caused by shutdown-only WAL
Date
Msg-id CAFC+b6rsM+WUoph-aPk5sz4cPzaQ4XkRDNwCJ5nG5+HsRQ=j8A@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Hi all,

While working with pg_rewind, I noticed that it can sometimes request a rewind even when no actual changes exist after a failover.

Problem:
Currently, pg_rewind determines the end-of-WAL on the target by using the last shutdown checkpoint (or minRecoveryPoint for a standby). This creates a false positive scenario:

1)Suppose a standby is promoted to become the new primary.
2)Later, the old primary is cleanly shut down.
3)The only WAL record generated on the old primary after divergence is a shutdown checkpoint.

At this point, the old primary and new primary contain identical data. However, since the shutdown checkpoint extends the WAL past the divergence point, pg_rewind concludes:

if (target_wal_endrec > divergerec)
    rewind_needed = true;

That forces a rewind even though there are no meaningful changes.

To reproduce this scenario use the below attached script.

Fix:
The attached patch changes the logic so that pg_rewind no longer treats shutdown checkpoints as meaningful records when determining the end-of-WAL. Instead, we scan backward from the last checkpoint until we find the most recent valid WAL record that is not a shutdown-only related record.

This ensures rewind is only triggered when there are actual modifications after divergence, avoiding unnecessary rewinds in clean failover scenarios.


--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/
Attachment

pgsql-hackers by date:

Previous
From: "Matheus Alcantara"
Date:
Subject: Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput
Next
From: Andrey Borodin
Date:
Subject: Re: [PATCH] Perform check for oversized WAL record before calculating record CRC