Re: [PATCH] Fix pg_rewind false positives caused by shutdown-only WAL - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [PATCH] Fix pg_rewind false positives caused by shutdown-only WAL |
Date | |
Msg-id | CA+TgmoYDgcV8viFQNLHYyC2Zct4sRsOf99rWWzjt6k3Z2hpNig@mail.gmail.com Whole thread Raw |
In response to | [PATCH] Fix pg_rewind false positives caused by shutdown-only WAL (Srinath Reddy Sadipiralla <srinath2133@gmail.com>) |
List | pgsql-hackers |
On Sat, Sep 6, 2025 at 12:34 PM Srinath Reddy Sadipiralla <srinath2133@gmail.com> wrote: > While working with pg_rewind, I noticed that it can sometimes request a rewind even when no actual changes exist aftera failover. > > Problem: > Currently, pg_rewind determines the end-of-WAL on the target by using the last shutdown checkpoint (or minRecoveryPointfor a standby). This creates a false positive scenario: > > 1)Suppose a standby is promoted to become the new primary. > 2)Later, the old primary is cleanly shut down. > 3)The only WAL record generated on the old primary after divergence is a shutdown checkpoint. > > At this point, the old primary and new primary contain identical data. This isn't true, because the control file changes when the shutdown checkpoint is written. Also, I think if you tested this even once, you'd find out that it doesn't actually work. At least, it doesn't work for me, and I don't see how it would work for you. I created a primary. Then I made a standby. Then I promoted the standby while doing a clean shutdown of the primary. Then I tried to restart the primary as a standby, and I get an infinite loop like this: 2025-09-25 15:07:03.094 EDT [58031] FATAL: terminating walreceiver process due to administrator command 2025-09-25 15:07:03.094 EDT [56485] LOG: new timeline 2 forked off current database system timeline 1 before current recovery point 0/040000D8 Now, admittedly, I didn't run pg_rewind here. But if I understand correctly, your patch's idea is to have pg_rewind say that everything is fine in this scenario. What it won't do is make postgres itself agree with that conclusion. I'm actually not sure the patch would have achieved that goal here, because the WAL actually looks like this: rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/04000028, prev 0/03000120, desc: RUNNING_XACTS nextXid 754 latestCompletedXid 753 oldestRunningXid 754 rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 0/04000060, prev 0/04000028, desc: CHECKPOINT_SHUTDOWN redo 0/04000060; tli 1; prev tli 1; fpw true; wal_level replica; xid 0:754; oid 14029; multi 1; offset 0; oldest xid 746 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown Your patch doesn't seem to contain any logic to ignore RUNNING_XACTS, just CHECKPOINT_SHUTDOWN. So in this scenario it might have still thought a rewind was necessary. But I think this test case still makes the point that you can't just change pg_rewind if the server has a different idea about how things work. Also, the server is correct to be concerned about the control file changing, and your patch is wrong to try to ignore such a change. It's true that the control file is not a relation data file, but its contents are very important, and updates to them matter when deciding whether servers are in sync. Also, even if the idea of your patch were correct, the logic it uses to try to find an XLOG_CHECKPOINT_SHUTDOWN or XLOG_SWITCH record is wrong. It's mandatory to first test XLogRecGetRmid(xlogreader) against the appropriate RM_whatever_ID; see walsummarizer.c for an example of how to do this correctly. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: